在另一个文件中编写与字符匹配的行

本质上，我想要编写与代码中引用的id列表相匹配的文档行。在另一个文件中编写与字符匹配的行

nodeIDs.txt：

...有417个对象，

粘着junction.txt：

...有73行，

的

4301: AFDN; afadin, adherens junction formation factor 
1496: CTNNA2; catenin alpha 2 
283106: CSNK2A3; casein kinase 2 alpha 3 
2241: FER; FER tyrosine kinase 
60: ACTB; actin beta 
1956: EGFR; epidermal growth factor receptor 
56288: PARD3; par-3 family cell polarity regulator 
10458: BAIAP2; BAI1 associated protein 2 
51176: LEF1; lymphoid enhancer binding factor 1

我试图让程序一行一行并引用ids列表，并且如果行的开头字符与列表中找到的任何字符匹配，则将该行写入新文档。我正在研究数据集，但我不确定这些数据是否可用。

到目前为止我的代码：

ids = [] 
with open('nodeIDs.txt', 'r') as n: 
    for line in n: 
     ids.append(line) 
n.close() 

# Import data from the pathway file and turn into a list 
g = [] 
with open('Adherens junction.txt', 'r') as a: 
    for line in a: 
     g.append(line) 
a.close() 

aj = open('Adherens.txt', 'a') 
for line in a: 
    if ids[i] in line: 
    aj.write(line) 
aj.close()

你能帮我得到这个工作？

来源

2017-03-05 Quintakov

这个问题将与大大改善[最小，完整的，并且可验证]（http://stackoverflow.com/帮助/ mcve）的例子。具体来说，数据是有效的，并不仅仅是从提供的数据中说明格式和期望的输出。 –

这里有一些代码，我认为你做了什么之后。

代码：

# read ids file into a set 
with open('file1', 'r') as f: 
    # create a set comprehension 
    ids = {line.strip() for line in f} 

# read the pathway file and turn into a list 
with open('file2', 'r') as f: 
    # create a list comprehension 
    pathways = [line for line in f] 

# output matching lines 
with open('file3', 'a') as f: 

    # loop through each of the pathways 
    for pathway in pathways: 

     # get the number in front of the ':' 
     start_of_line = pathway.split(':', 1)[0] 

     # if this is in 'ids' output the line 
     if start_of_line.strip() in ids: 
      f.write(pathway)

结果：

2241: FER; FER tyrosine kinase 
56288: PARD3; par-3 family cell polarity regulator

file1的：

10000 
56288 
2241

文件2：

4301: AFDN; afadin, adherens junction formation factor 
1496: CTNNA2; catenin alpha 2 
283106: CSNK2A3; casein kinase 2 alpha 3 
2241: FER; FER tyrosine kinase 
60: ACTB; actin beta 
1956: EGFR; epidermal growth factor receptor 
56288: PARD3; par-3 family cell polarity regulator 
10458: BAIAP2; BAI1 associated protein 2 
51176: LEF1; lymphoid enhancer binding factor 1

什么是一套理解？

此：

# create a set comprehension 
ids = {line.strip() for line in f}

相同：

# create a set 
ids = set() 
for line in f: 
    ids.add(line.strip())

来源

2017-03-05 02:41:12

这工作完美 - 感谢格式化！你能否进一步解释一下你的代码中的“line for line”和“path in pathways”部分发生了什么？ – Quintakov

'line in line'是一个标准的python迭代器。许多对象（例如：'list'）实现了一个'__next__'方法，它允许这个非常漂亮的语法。所以它基本上按照它读取的内容执行，它为行中的每一行运行for循环，一次一个。不是python好玩吗？你也可能对理解不熟悉。我更新了这篇文章，注意到这两个理解。参见：http://stackoverflow.com/questions/1747817/create-a-dictionary-with-list-comprehension-in-python –

在另一个文件中编写与字符匹配的行

回答

相关问题