2017-03-05 87 views
0

本质上,我想要编写与代码中引用的id列表相匹配的文档行。在另一个文件中编写与字符匹配的行

nodeIDs.txt:

...有417个对象,

10000 
10023 
1017 
1019 
1021 
1026 
1027 
1029 
... 

粘着junction.txt:

...有73行,

4301: AFDN; afadin, adherens junction formation factor 
1496: CTNNA2; catenin alpha 2 
283106: CSNK2A3; casein kinase 2 alpha 3 
2241: FER; FER tyrosine kinase 
60: ACTB; actin beta 
1956: EGFR; epidermal growth factor receptor 
56288: PARD3; par-3 family cell polarity regulator 
10458: BAIAP2; BAI1 associated protein 2 
51176: LEF1; lymphoid enhancer binding factor 1 

我试图让程序一行一行并引用ids列表,并且如果行的开头字符与列表中找到的任何字符匹配,则将该行写入新文档。我正在研究数据集,但我不确定这些数据是否可用。

到目前为止我的代码:

ids = [] 
with open('nodeIDs.txt', 'r') as n: 
    for line in n: 
     ids.append(line) 
n.close() 

# Import data from the pathway file and turn into a list 
g = [] 
with open('Adherens junction.txt', 'r') as a: 
    for line in a: 
     g.append(line) 
a.close() 

aj = open('Adherens.txt', 'a') 
for line in a: 
    if ids[i] in line: 
    aj.write(line) 
aj.close() 

你能帮我得到这个工作?

+0

这个问题将与大大改善[最小,完整的,并且可验证](http://stackoverflow.com/帮助/ mcve)的例子。具体来说,数据是有效的,并不仅仅是从提供的数据中说明格式和期望的输出。 –

回答

2

这里有一些代码,我认为你做了什么之后。

代码:

# read ids file into a set 
with open('file1', 'r') as f: 
    # create a set comprehension 
    ids = {line.strip() for line in f} 

# read the pathway file and turn into a list 
with open('file2', 'r') as f: 
    # create a list comprehension 
    pathways = [line for line in f] 

# output matching lines 
with open('file3', 'a') as f: 

    # loop through each of the pathways 
    for pathway in pathways: 

     # get the number in front of the ':' 
     start_of_line = pathway.split(':', 1)[0] 

     # if this is in 'ids' output the line 
     if start_of_line.strip() in ids: 
      f.write(pathway) 

结果:

2241: FER; FER tyrosine kinase 
56288: PARD3; par-3 family cell polarity regulator 

file1的:

10000 
56288 
2241 

文件2:

4301: AFDN; afadin, adherens junction formation factor 
1496: CTNNA2; catenin alpha 2 
283106: CSNK2A3; casein kinase 2 alpha 3 
2241: FER; FER tyrosine kinase 
60: ACTB; actin beta 
1956: EGFR; epidermal growth factor receptor 
56288: PARD3; par-3 family cell polarity regulator 
10458: BAIAP2; BAI1 associated protein 2 
51176: LEF1; lymphoid enhancer binding factor 1 

什么是一套理解?

此:

# create a set comprehension 
ids = {line.strip() for line in f} 

相同:

# create a set 
ids = set() 
for line in f: 
    ids.add(line.strip()) 
+0

这工作完美 - 感谢格式化!你能否进一步解释一下你的代码中的“line for line”和“path in pathways”部分发生了什么? – Quintakov

+1

'line in line'是一个标准的python迭代器。许多对象(例如:'list')实现了一个'__next__'方法,它允许这个非常漂亮的语法。所以它基本上按照它读取的内容执行,它为行中的每一行运行for循环,一次一个。不是python好玩吗?你也可能对理解不熟悉。我更新了这篇文章,注意到这两个理解。参见:http://stackoverflow.com/questions/1747817/create-a-dictionary-with-list-comprehension-in-python –

相关问题