2017-08-10 41 views
1

所以我需要一种简单的方法从段落中的搜索词前后拉十个单词,并将其全部提取到一个句子中。如何在python中围绕特定单词拉出多个单词?

例如:

段落=“的家犬(家犬或家犬)是形成狼状犬科动物的一部分属犬(犬科动物)的成员,并且是最广泛丰富的食肉动物。狗和现存的灰狼是姊妹分类群,现代狼与先驯化的狼没有密切关系,这意味着狗的直系祖先已经灭绝。这只狗是第一个驯养的品种,已经有数千年的选择性繁殖,用于各种行为,感官能力和身体属性。“

输入

输出

最广泛丰富的食肉动物。狗和现存的灰太狼是姐妹群,与现代狼没有发现目标字的位置后密切相关

回答

0

您可以尝试使用字符串。你到目前为止试过编码吗?

4
​​

输出:

most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to 
+0

当然这很简单,你是g遇到大量文本的性能问题。 – WombatPM

2

这是正则表达式,可以帮助您提取所需文本:

(?:[^ ]+){0,10}wolf(?: [^ ]+){0,10} 

也是一个Python的例子应该像,虽然我不能现在测试它:

import re 

t = "The domestic dog (Canis lupus familiaris or Canis familiaris) is a member of genus Canis (canines) that forms part of the wolf-like canids, and is the most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to the wolves that were first domesticated, which implies that the direct ancestor of the dog is extinct. The dog was the first domesticated species and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes" 

m = re.search("(?:[^ ]+){0,10}wolf\s(?:[^ ]+){0,10}", t) 

if m: 
    print (m.group(0)) 
相关问题