2015-11-08 155 views
0

以下我的前面的问题,我试图处理一个代码来返回一个字符串,如果某个列表中的搜索项是在一个字符串中返回如下。如何遍历一个python列表并比较一个字符串或另一个列表中的项目

import re 
from nltk import tokenize 
from nltk.tokenize import sent_tokenize 
def foo(): 
    List1 = ['risk','cancer','ocp','hormone','OCP',] 
    txt = "Risk factors for breast cancer have been well characterized. Breast cancer is 100 times more frequent in women than in men.\ 
    Factors associated with an increased exposure to estrogen have also been elucidated including early menarche, late menopause, later age\ 
    at first pregnancy, or nulliparity. The use of hormone replacement therapy has been confirmed as a risk factor, although mostly limited to \ 
    the combined use of estrogen and progesterone, as demonstrated in the WHI (2). Analysis showed that the risk of breast cancer among women using \ 
    estrogen and progesterone was increased by 24% compared to placebo. A separate arm of the WHI randomized women with a prior hysterectomy to \ 
    conjugated equine estrogen (CEE) versus placebo, and in that study, the use of CEE was not associated with an increased risk of breast cancer (3).\ 
    Unlike hormone replacement therapy, there is no evidence that oral contraceptive (OCP) use increases risk. A large population-based case-control study \ 
    examining the risk of breast cancer among women who previously used or were currently using OCPs included over 9,000 women aged 35 to 64 \ 
    (half of whom had breast cancer) (4). The reported relative risk was 1.0 (95% CI, 0.8 to 1.3) among women currently using OCPs and 0.9 \ 
    (95% CI, 0.8 to 1.0) among prior users. In addition, neither race nor family history was associated with a greater risk of breast cancer among OCP users." 
    words = txt 
    corpus = " ".join(words).lower() 
    sentences1 = sent_tokenize(corpus) 
    a = [" ".join([sentences1[i-1],j]) for i,j in enumerate(sentences1) if [item in List1] in word_tokenize(j)] 


    for i in a: 
     print i,'\n','\n' 

foo() 

问题是,蟒蛇IDLE不打印任何东西。我可能做错了什么。它的作用是运行代码,我得到这个

> >

回答

1

你的问题我不太清楚,所以请纠正我,如果我得到这个错误。你是否尝试将关键字列表(在list1中)与文本(在txt中)进行匹配?也就是说,

  • 对于每个关键字列表1
  • 不要反对TXT每个句子匹配。
  • 打印句子,如果他们匹配?

不是写一个复杂的正则表达式来解决你的问题,我已经把它分解成了两部分。

首先我把整个文本分成一个句子列表。然后写简单的正则表达式来遍历每个句子。这种方法的麻烦在于效率不高,但嘿它解决了你的问题。

希望这一小块代码可以帮助您指导真正的解决方案。

def foo(): 
    List1 = ['risk','cancer','ocp','hormone','OCP',] 
    txt = "blah blah blah - truncated" 
    words = txt 

    matches = [] 
    sentences = re.split(r'\.', txt) 
    keyword = List1[0] 
    pattern = keyword 
    re.compile(pattern) 

    for sentence in sentences: 
     if re.search(pattern, sentence): 
      matches.append(sentence) 

    print("Sentence matching the word (" + keyword + "):") 
    for match in matches: 
     print (match) 

---------生成随机数-----

from random import randint 

List1 = ['risk','cancer','ocp','hormone','OCP',] 
print(randint(0, len(List1) - 1)) # gives u random index - use index to access List1 
+0

你做了我很多的青睐!谢谢!!有用。虽然我想以一种随机选择项目而不是例如从List1中选择项目的方式工作, List1 [0]或List1 [3] – wakamdr

+0

也许尝试: 随机导入randint。我已更新解决方案以包含示例代码。 –

+0

太好了。适用于关键字= List1 [(randint(0,len(List1) - 1))] ....也适用于while循环 – wakamdr

相关问题