比较两个文件，并找到在蟒蛇匹配词

我有两个文件：第一个包含术语及其频率：比较两个文件，并找到在蟒蛇匹配词

table 2 
apple 4 
pencil 89

第二个文件是一个字典：

abroad 
apple 
bread 
...

我想检查第一个文件是否包含第二个文件中的任何单词。例如，第一个文件和第二个文件都包含“apple”。我是python的新手。我尝试了一些，但它不起作用。你可以帮帮我吗？谢谢

for line in dictionary: 
    words = line.split() 
    print words[0] 

for line2 in test: 
    words2 = line2.split() 
    print words2[0]

来源

2013-05-03 user951487

事情是这样的：

with open("file1") as f1,open("file2") as f2: 
    words=set(line.strip() for line in f1) #create a set of words from dictionary file 

    #why sets? sets provide an O(1) lookup, so overall complexity is O(N) 

    #now loop over each line of other file (word, freq file) 
    for line in f2: 
     word,freq=line.split() #fetch word,freq 
     if word in words:  #if word is found in words set then print it 
      print word

输出：

apple

来源

2013-05-03 09:07:26

如果有多个匹配项，此代码不起作用:( – user951487 2013-05-03 09:25:49

@ user9514870这是因为你说：*“我想检查第一个文件是否包含**任何**字”*，你可以删除'break '声明得到所有常见的词 – 2013-05-03 09:27:48

现在它的作品谢谢你Ashwini :) – user951487 2013-05-03 09:30:53

它可以帮助你：

file1 = set(line.strip() for line in open('file1.txt')) 

file2 = set(line.strip() for line in open('file2.txt')) 

for line in file1 & file2: 

    if line: 

     print line

来源

2013-05-03 09:07:15 snehal

单列表这是行不通的一个文件中包含的单词和其他含有空格分隔值。 – 2013-05-03 09:28:30

这里是你应该做的：

首先，你需要把所有的字典单词放在某个地方，你可以很容易地查看它们。如果你不这样做，每次你想检查另一个文件中的一个单词时，你必须阅读整个字典文件。
其次，您需要检查文件中的每个单词是否在您从字典文件中提取的单词中。

在第一部分，你需要使用一个list或set。这两者之间的区别在于list会保留您放入物品的订单。 A set是无序的，因此，您从字典文件中首先阅读哪个单词并不重要。此外，查找某个项目时，set会更快，因为这就是它的用途。

要查看某个项目是否在一个集合中，您可以执行：item in my_set，它可以是True或False。

来源

2013-05-03 09:08:38 jadkik94

我在try.txt你的第一双列表和try_match.txt

f = open('try.txt', 'r') 
f_match = open('try_match.txt', 'r') 
print f 
dictionary = [] 
for line in f: 
    a, b = line.split() 
    dictionary.append(a) 

for line in f_match: 
    if line.split()[0] in dictionary: 
     print line.split()[0]

来源

2013-05-03 09:21:26 octoback

它的作品谢谢你的反托拉斯。 – user951487 2013-05-03 09:29:12

@ user951487这个解决方案的复杂性是'O（N^2）'解决方案，而我的解决方案是'O（N）'。 – 2013-05-03 09:41:28

比较两个文件，并找到在蟒蛇匹配词

回答

相关问题