DIFF和交叉的两个文本之间的汇报文件

声明：我是新来的一般程序和脚本，所以请原谅缺乏技术方面DIFF和交叉的两个文本之间的汇报文件

所以我有一个包含姓名两个文本文件中的数据集列出：

First File | Second File 
bob  | bob 
mark  | mark 
larry  | bruce 
tom  | tom

我想运行脚本（PREF蟒），其输出在一个文本文件中的交叉线，而在另一个文本文件中的不同的线路，例如：

matches.txt：

bob 
mark 
tom

differences.txt：

bruce

我将如何做到这一点与Python？或者用Unix命令行，如果它很容易？

来源

2013-04-29 Mark Halpern

使用'sets'和标准文件IO ...在那里为好措施了'string.split' :) ...或者你有什么尝试过，你卡在哪里？ – 2013-04-29 22:45:35

Unix'diff'命令不够好吗？ – user1613254 2013-04-29 22:49:04

我怀疑订购不重要，所以可能不会... – 2013-04-29 22:49:52

words1 = set(open("some1.txt").read().split()) 
words2 = set(open("some2.txt").read().split()) 

duplicates = words1.intersection(words2) 
uniques = words1.difference(words2).union(words2.difference(words1)) 

print "Duplicates(%d):%s"%(len(duplicates),duplicates) 
print "\nUniques(%d):%s"%(len(uniques),uniques)

类似的东西至少

来源

2013-04-29 23:17:54

嘿，我有一个问题，如果文件太大，将整个内容存储在设置中，有没有任何有效的方式来做它的大文件？ – Ja8zyjits 2015-07-22 11:14:18

Unix外壳解决方案为：

# duplicate lines 
sort text1.txt text2.txt | uniq -d 

# unique lines 
sort text1.txt text2.txt | uniq -u

来源

2013-04-29 22:50:15 suspectus

注意OP：要输出到一个文件，只需在命令末尾用'> file.txt'重定向输出，就像这样：'sort text1.txt text2.txt | uniq的-d> dups.txt'通过[clfu] – 2013-04-29 23:11:43

，对重复（http://www.commandlinefu.com/commands/view/5707/intersection-between-two-files#comment）：'（排序-u file1的; sort -u file2）|排序| uniq -d'（这个虽然看起来也是一样，但是更短） – 2014-09-09 11:11:30

Python字典是O（1）或非常接近，换句话说，他们是非常快的（但他们会使用大量内存，如果你的索引文件较大）。所以在第一个文件阅读并构建一个字典是这样的：

left = [x.strip() for x in open('left.txt').readlines()]

列表内涵和钢带（）是必需的，因为readlines方法你手中与尾随的换行符完整的线。这将创建文件中所有项目的列表，假设每行一个（如果它们都在同一行上，则使用.split）。

现在建立一个字典：

ldi = dict.fromkeys(left)

这与建立在列表作为关键字项目的字典。这也涉及重复。现在遍历第二个文件并检查密钥是否在字典中：

matches = open('matches.txt', 'w') 
uniq = open('uniq.txt', 'w') 
for l in open('right.txt').readlines(): 
    if l.strip() in ldi: 
     # write to matches 
     matches.write(l) 
    else: 
     # write to uniq 
     uniq.write(l) 
matches.close() 
uniq.close()

来源

2013-04-29 22:56:48 izak

想想吧，这赢得了'在left.txt中找到唯一的名字。足够简单，只需将dict解决方案镜像即可获得该解决方案，但您也可以查看python“set”类型，从而可以轻松确定交叉点/差异。 – izak 2013-04-29 23:00:43

sort | uniq很好，但comm可能会更好。 “男子通讯”获取更多信息。

从手册页：

EXAMPLES 
     comm -12 file1 file2 
       Print only lines present in both file1 and file2. 

     comm -3 file1 file2 
       Print lines in file1 not in file2, and vice versa.

您也可以使用Python的集合类型，但通讯更加容易。

来源

2013-04-29 23:12:26 dstromberg

>>> with open('first.txt') as f1, open('second.txt') as f2: 
     w1 = set(f1) 
     w2 = set(f2) 


>>> with open('matches.txt','w') as fout1, open('differences.txt','w') as fout2: 
     fout1.writelines(w1 & w2) 
     fout2.writelines(w2 - w1) 


>>> with open('matches.txt') as f: 
     print f.read() 


bob 
mark 
tom 
>>> with open('differences.txt') as f: 
     print f.read() 


bruce

来源

2013-04-29 23:25:15 jamylak

DIFF和交叉的两个文本之间的汇报文件

回答

相关问题