2015-08-28 131 views
2

我正在制作一个正在浏览很多评论的漫游器,并且我想找到任何以“I'm”或“I am”开头的句子。这是一个示例注释(有两个句子,我想提取)。找到并提取一段包含python中的关键字的字符串

"Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time." 

这是我到目前为止的功能。

keywords = ["i'm ","im ","i am "] 

def get_quote(comments): 
    quotes = [] 
    for comment in comments: 
     isMatch = any(string in comment.text.lower() for string in keywords) 
     if isMatch: 

我怎样才能找到其中的句子开始和结束,所以我可以把它.append到列表quotes

+1

你如何判断句子的结尾? – Kasramvd

+1

查看'str'实例的'index'和'find'方法。另一个解决方案是使用正则表达式。看看这个[示例](http://stackoverflow.com/questions/8459412/find-start-and-end-positions-of-all-occurrences-within-a-string-in-python#8459451)。 – kikocorreoso

+0

@Kasramvd A'''我觉得最好。 – MadManoloz

回答

6

您可以使用regular expressions此:

>>> import re 
>>> text = "Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time." 
>>> re.findall(r"(?i)(?:i'm|i am).*?[.?!]", text) 
["I'm sorry.", 
"I'm sure everyone's day will come, it's just a matter of time."] 

我在这里使用的模式是r"(?i)(?:i'm|i am).*?[.?!]"

  • (?i)设置标志 “忽略大小写”
  • (?:i'm|i am) “我” 或(| )“我是”,?:表示非捕获组
  • ?)与任何字符(.)的序列(*)匹配...
  • [.?!] ...直到找到文字点,问号或感叹号。

请注意,这只有在没有“其他”点时才起作用,即如“Dr.”或“先生”,因为这些也将被视为判决结束。

+0

哇,看起来很简单。有些人很棒,非常感谢你! – MadManoloz

2

检查此代码对你的作品

def get_quote(comments): 
    keywords = ["i'm ","im ","i am "] 
    quotes = [] 
    for comment in comments: 
     isMatch = any(string in comment.lower() for string in keywords) 
     if isMatch: 
      quotes.append(comment) 
    print "Lines having keywords are " 
    for q in quotes: 
     print q 


if __name__ == "__main__": 
    a="Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time." 
    #Removed last "." from line before splitting on basis of "." 
    a = a.rstrip(".") 
    list_val = a.split(".") 
    get_quote(list_val) 

输出:

C:\Users\Administrator\Desktop>python demo.py 
Lines having keywords are 
I'm sorry 
I'm sure everyone's day will come, it's just a matter of time 

C:\Users\Administrator\Desktop> 
+0

这很完美!谢谢 – MadManoloz

相关问题