2017-02-15 63 views
0

这是我迄今为止,但我卡住了。我能够过滤我想要的值,但我无法弄清楚如何获取这些过滤的值,而不是将它们放入列表返回列标题中。按行过滤值,如果超过阈值返回列标题

这是我的数据是这样的:

Taxa,Pop_1,Pop_2,Pop_3,Pop_4,Pop_5,Pop_6,Pop_7 
IPs216:C95NTANXX:1:250590968,0.000023,0.999865,0.000023,0.000023,0.000023,0.000023,0.000022 
IPs159:C95NTANXX:1:250591032,0.000023,0.000023,0.000023,0.000023,0.000023,0.999864,0.000023 
IPs286:C95NTANXX:1:250591013,0.000024,0.000024,0.000024,0.000024,0.000024,0.000024,0.999856 
IPs63:C95NTANXX:1:250591090,0.000024,0.000024,0.409426,0.352769,0.000024,0.237707,0.000024 
IPs892:C95NTANXX:1:250591054,0.000024,0.000024,0.999853,0.000024,0.000024,0.000024,0.000024 
IPs264:C95NTANXX:1:250590956,0.000023,0.000023,0.000023,0.999864,0.000023,0.000023,0.000023 
IPs716:C95NTANXX:1:250590960,0.000023,0.000023,0.999864,0.000023,0.000023,0.000023,0.000023 
IPs854:C95NTANXX:1:250590951,0.000022,0.080564,0.919325,0.000022,0.000022,0.000022,0.000022 
IPs914:C95NTANXX:1:250591052,0.238472,0.000023,0.000023,0.686966,0.000023,0.074471,0.000023 
IPs729:C95NTANXX:1:250591019,0.000022,0.000022,0.000022,0.999869,0.000022,0.000022,0.000022   

这是我的代码:

f=open("/home/mjohnson/Desktop/Millet_Files/final_analysis/trees/pop_info/kodo_mod_7.meanQ" , "r") 
col_titles=list() 
pop_values=list() 
f.readline() 
filtered=list() 
#gives a list with column names, i need to index this to pair values with them 
a=open("/home/mjohnson/Desktop/Millet_Files/final_analysis/trees/pop_info/kodo_mod_7.meanQ" , "r") 
col_titles.append(a.readline()) 
col_names=list() 
for names in col_titles: 
    q=names.strip('\n').split(',') 
    col_names.append(q) 
#end of getting column names 

for line in f: 
    x=line.strip('\n').split(',') 
    x=x[1:] #this has the list ignore the first values, so taxa names ignored 
    for score in x: 
     if float(score) > 0.5: 
      filtered.append(score+'\n') 
+0

你所做的一切读取文件是'readline()',即一次一行,但是你不遍历文件。 – roganjosh

回答

0

你为什么试图解析CSV文件自己?请参阅标准CSV模块。特别是你想要的是csv.DictReader()类。

例:

with open('kodo_mod_7.meanQ', 'rb') as fin: 
    reader = csv.DictReader(fin) 

    for row in reader: 
     for column_label, column_value in row.iteritems(): 
      if not column_label.startswith('Pop_'): 
       continue 

      if float(column_value) > 0.5: 
       yield row 
0

两两件事:

1)你不必扔掉的第一列,你可以跳过它通过调节回路

2)使用枚举来为你正在循环的事物编号,当它不在你身边时。

for i, score in enumerate(x[1:]): 
    if float(score) > 0.5: 
     filtered.append(col_names[i]+'\n')