2016-12-04 73 views
-2

我有以下的数据集(这是一个示例):费舍尔耶茨洗牌在python

ID  Sub1 Sub2 Sub3 Sub4 
Creb3l1 10.14 9.67 10.14 10.42 
Chchd6 11.25 10.74 10.80 11.07 
Arih1 9.91 9.25 10.20 9.34 
Prpf8 11.54 11.58 11.14 11.36 
Rfng 11.71 11.56 10.81 10.72 
Rnf114 12.66 12.60 12.59 12.56 

我要进行的费雪耶茨对这个数据交叉设置10倍(即写10个输出文件,每一个使用Fisher Yates shuffle进行一次数据随机化)。

我写这个代码:

import sys 
import itertools 
from itertools import permutations 

for line in open(sys.argv[1]).readlines()[2:]: 
    line = line.strip().split() 
    ID = line[0] 
    expression_values = line[1:] 
    for shuffle in permutations(expression_values): 
     print shuffle 

此代码的输出是这样的(样品):

('11.25', '10.74', '10.80', '11.07') 
('11.25', '10.74', '11.07', '10.80') 
('11.25', '10.80', '10.74', '11.07') 
('11.25', '10.80', '11.07', '10.74') 
('11.25', '11.07', '10.74', '10.80') 
('11.25', '11.07', '10.80', '10.74') 
('10.74', '11.25', '10.80', '11.07') 
('10.74', '11.25', '11.07', '10.80') 
('10.74', '10.80', '11.25', '11.07') 
('10.74', '10.80', '11.07', '11.25') 
('10.74', '11.07', '11.25', '10.80') 
('10.74', '11.07', '10.80', '11.25') 
('10.80', '11.25', '10.74', '11.07') 
('10.80', '11.25', '11.07', '10.74') 
('10.80', '10.74', '11.25', '11.07') 
('10.80', '10.74', '11.07', '11.25') 
('10.80', '11.07', '11.25', '10.74') 
('10.80', '11.07', '10.74', '11.25') 
('11.07', '11.25', '10.74', '10.80') 
('11.07', '11.25', '10.80', '10.74') 
('11.07', '10.74', '11.25', '10.80') 
('11.07', '10.74', '10.80', '11.25') 
('11.07', '10.80', '11.25', '10.74') 
('11.07', '10.80', '10.74', '11.25') 
('9.91', '9.25', '10.20', '9.34') 
('9.91', '9.25', '9.34', '10.20') 

,我有麻烦正在产生的随机化数据的块的特定部分(例如给我一组7条Fisher-Yates随机线,我可以写入文件)。如果有人能告诉我如何编辑上面的代码来生成10个输出文件,每个文件包含7行文本(即与输入文件相同的编号),每个文件都带有一个随机化的Fisher Yates混洗值集合,我将不胜感激它。

编辑1:我已经尝试了几种不同的方式: 例如下面的代码:

for line in open(sys.argv[1]).readlines()[2:]: 
    line = line.strip().split() 
    gene_name = line[0] 
    expression_values = line[1:] 
    RandomList = [] 
    for shuffle in permutations(expression_values): 
     while len(RandomList) <10:                                         
      RandomList.append(shuffle)                                        
    print RandomList                                             

我以为会给我回每行10个randomisations。它给我回同样的随机线,10倍,每行:

[('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07')] 
[('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34')] 
[('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36')] 
[('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72')] 
[('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56')] 

编辑2:肖恩:非常感谢你的帮助,所以我确实知道如何写入文件一般,例如我可以说:

for i in range(10): 
    output_file = "random." + str(i) 
    open_output_file = open(output_file, 'a') 
    ***for each line of the randomised array***: 
     open_output_file.write(line + "\n") 
    open_output_file.close() 

我有写文件的问题是,我甚至不能得到我想要打印到屏幕首先,例如,如果我运行这段代码是什么:

import sys 
    import itertools 
    from itertools import permutations 

    for i in range(10): 
     for line in open(sys.argv[1]).readlines()[2:]: 
      line = line.strip().split() 
      gene_name = line[0] 
      expression_values = line[1:] 
      for shuffle in permutations(expression_values): 
       print shuffle[:6] 
      print "***" 
    i +=1 

我会希望输出是7条随机线,接着是“***”,然后是7条随机线,10次。但是它会打印每行的所有组合。

+0

你被困在哪一部分?获得七个小组?将它们写入文件?所有这些东西都有答案。 – jonrsharpe

+0

谢谢,我编辑了这个问题。是的,我得到的输出是120行打印到屏幕/写入文件。我很困惑如何获得7人组,例如每次打印一行7行,写入文件(然后执行10次)。 – user1288515

+0

你有什么尝试?制作一份清单,也许?在达到适当的长度时行动?如果你已经做出努力,展示它。如果你还没有,就制作一个!或者只是[做一些研究](http://stackoverflow.com/questions/3992735/python-generator-that-groups-another-iterable-into-groups-of-n)。 – jonrsharpe

回答

-1

“包含7行文本的每个文件”

听起来像是你想要做的阵列切片。

a = [ 1, 2, 3, 4, 5, 6 ] 
a[:3] 

将产生1, 2, 3

阵列切片被索引的起始索引,结束索引完成,并跳过。在a[:3]起始索引被跳过,以便它在0开始元件3

a[1:3]将产生[2, 3]

a[1:5:2]将在1开始,结束于5,跳过2。因此,这将产生[2, 4]

所以,在你的榜样,它看起来像你想要写shuffle[:6]

至于写文件,你需要一些类型的循环

,因为我在范围(0,10): 文件名= “输出 - %s.txt” %i个

这将产生的文件名输出0.txt,输出的1.txt等

https://docs.python.org/2/tutorial/inputoutput.html约文件输入/产量。基本上你应该使用with关键字和open

with open(filename, 'w') as f: 
    f.write(str(shuffle[:7])) 

这应该让你在正确的方向

0

我想我有一个解决办法:

import sys 
import itertools 
from itertools import permutations 
import os 

#Write the header line to 10 random files 
fileopen = open(sys.argv[1]).readlines() 
for i in range(10): 
    file_name = "random" + str(i) + ".txt" 
    open_file_name = open(file_name, 'a') 
    open_file_name.write(fileopen[0].strip() + "\n") 

#Write the rest of the info to 10 random files 
for line in fileopen: 
    if "Sub" not in line: 
      line = line.strip().split() 
      ID = line[0] 
      expression_values = line[1:] 
      ListOfShuffles = permutations(expression_values) 
      for ind,i in enumerate(list(ListOfShuffles)[0:10]): 
       file_name = "random" + str(ind) + ".txt" 
       open_file_name = open(file_name, 'a') 
       open_file_name.write(ID + "\t" + "\t".join(i) + "\n")