在两个不同的GPU

我具有类似于下面的一个PyTorch脚本运行在平行的Python代码部在两个不同的终端运行下面的命令：在两个不同的GPU

CUDA_VISIBLE_DEVICES=0 python program.py --method method1 
CUDA_VISIBLE_DEVICES=1 python program.py --method method2

的问题是，上述的数据加载器功能包含一些随机性在里面，这意味着两个方法分别应用于两个不同的训练数据集。我想他们训练完全相同的一组数据，所以我修改了脚本如下：

# Loading data 
train_loader, test_loader = someDataLoaderFunction() 

# Define the architecture 
model = ResNet18() 
model = model.cuda() 

## Run for the first method 
method = 'method1' 

# Training 
train(method, model, train_loader, test_loader) 

## Run for the second method 
method = 'method2' 

# Must re-initialize the network first 
model = ResNet18() 
model = model.cuda() 

# Training 
train(method, model, train_loader, test_loader)

是否有可能使其在每个方法并行地运行？非常感谢您的帮助！

来源

2017-09-27 Khue

恩，平行计算完全需要不同的编码架构，你以前做过什么吗？我所能做的至少是指向Python 3中的'queue'内建库，您必须使用它来编排并行执行。也请阅读关于比赛条件和线程锁定，否则你可能最终在编码沮丧 – aim100k

@ aim100k谢谢。我只是做了一些基本的东西，比如C++或Matlab中的并行循环：（ – Khue

）我看到了你的网站，我认为你所做的真的很棒，我也喜欢这些主题，但不能承受那么多的教育。你在这里找到答案 – aim100k

我想最简单的方法是修复种子如下。

myseed=args.seed 
np.random.seed(myseed) 
torch.manual_seed(myseed) 
torch.cuda.manual_seed(myseed)

这应该强制数据加载器每次都得到相同的样本。平行的方式是使用多线程，但我几乎看不出你发布的问题的麻烦。

来源

2017-09-28 09:45:05

谢谢，我在这里得到了同样的答案：https：//discuss.pytorch.org/t/how-to-run-two-training-methods-in-parallel-on-exactly-the-same-data/7796/ 2 – Khue

在两个不同的GPU

回答

相关问题