从两个文件中的项目获取矩阵

-1

我有两个文件，我想从中获取以下存在（1）和缺席（0）的矩阵。如果在任何FILEB项（或COL1，不知道哪个输入是最好在这里）中cols2-4一个项目，“1”的分数被记录时，匹配其他明智“0”被记录从两个文件中的项目获取矩阵

文件答：

col1 col2 col3 col4 
esd dus esd muq 
uum uum dus esd 
dus esd uum dus 
muq muq muq uum

文件B：

esd 
uum 
dus 
muq

我尝试：

out_file=open("out.txt", "w") 
for itemA in open("fileA", "r") as file1: 
    file2=open("fileB", "r") 
    for row in file2: 
     for col in file2: 
      if itemA==file2[row][col]: 
       out_file.write(int(1)) 
      else: 
       out_file.write(int(0))

预期输出：

col1 col2 col3 
esd 0 1 0 
uum 1 0 0 
dus 0 0 1 
muq 1 1 0

帮助python代码将不胜感激。

来源

2014-11-21 user27976

你的代码的实际输出是什么？ – boh 2014-11-21 14:40:38

使用熊猫。 http://pandas.pydata.org/ – acushner 2014-11-21 14:42:45

@boh：看代码，我的猜测会是语法错误;） – Wolph 2014-11-21 14:48:56

是否有这样的工作适合你？

with open('a.txt') as fh: 
    for line in fh: 
     cols = line.split() 
     key = cols[0] 
     print key, 
     for col in cols[1:]: 
      # Print 1 if they are the same, 0 otherwise 
      print int(col == key), 

     # Newline 
     print

随着a.txt：

esd dus esd muq 
uum uum dus esd 
dus esd uum dus 
muq muq muq uum

输出：

esd 0 1 0 
uum 1 0 0 
dus 0 0 1 
muq 1 1 0

来源

2014-11-21 14:48:28 Wolph

你不需要文件B，如果文件中的每一行的第一个项目是，你的东西寻找。

result = [] 
for line in open('input.txt').readlines(): 
    tokens = line.split() 
    seek = tokens[0] # We seek occurrences of the first token in the row. 
    row = [seek]  # This array stores pieces of output. 
    for item in tokens[1:]: 
     if item == seek: 
      row.append('1') # Note that these are strings, not integers. 
     else:     # You might like to replace them with other 
      row.append('0') # values such as 'Y'/'N' or 'T'/'F'. 
    result.append(row) 
lines = [' '.join(row) for row in result] # Making lines of output. 
text = '\n'.join(lines)      # Gluing the lines together. 
print(text)         # Printing for verification. 
with open('output.txt', 'w') as out_file: # Then writing to file. 
    out_file.write(text+'\n')

上面的代码将借此输入：

esd dus esd muq 
uum uum dus esd 
dus esd uum dus 
muq muq muq uum

，并产生这样的输出：

esd 0 1 0 
uum 1 0 0 
dus 0 0 1 
muq 1 1 0

来源

2014-11-21 14:49:14

如果B中的列不必匹配在A的第一列，则你可以在任何文件上调用next方法使其处于同步读取的形式：

fileA = 'fileA.tsv' 
fileB = 'fileB.tsv' 
outfilename = 'outfile.tsv' 

with open(fileA) as fa: 
    with open(fileB) as fb: 
     with open(outfilename, 'w') as outfile: 
      for line in fb: 
       corresp_a_line = fa.next() 
       fields = corresp_a_line.split() 
       outfile.write(fields[0]) # write column 1 
       for field in fields[1:]: 
        outfile.write("\t{}".format(int(line.strip() in field))) 
       outfile.write("\n")

来源

2014-11-21 14:54:50

从两个文件中的项目获取矩阵

回答

相关问题