2016-12-31 63 views
1

我无法打印正确的关键字在下面的代码中发现的链接:设置变量等于行,其中关键字发现

import urllib2 
from random import randint 
import time 
from lxml import etree 
from time import sleep 

a = requests.get('http://properlbc.com/sitemap.xml') 
#time.sleep(1) 
scrape = BeautifulSoup(a.text, 'lxml') 
linkz = scrape.find_all('loc') 
for linke in linkz: 
    if "products" in linke.text: 
     sitemap = str(linke.text) 
     break 



while True: 
# sleep(randint(4,6)) 
    keyword1 = "properlbc" 
    keyword2 = "products" 
    keyword3 = "bb1296" 
    r = requests.get(sitemap) 
# time.sleep(1) 
    soup = BeautifulSoup(r.text, 'lxml') 
    links = soup.find_all('loc') 
    for link in links: 
     while (keyword1 in link.text and keyword2 in link.text and keyword3 in link.text): 
      continue 
     print("LINK SCRAPED") 
     print(str(link.text) + "link scraped") 
     break 

的代码是成功的循环,直到用关键字链接被发现但它不打印带有关键字的具体环节,它打印的,而不是“https://properlbc.com/collections/new-arrival/products/bb1296

+0

。 – furas

回答

1

你要做

for link in links: 
    if keyword1 in link.text and keyword2 in link.text and keyword3 in link.text: 
     print("LINK SCRAPED") 
     print(str(link.text) + "link scraped") 

最初的“link.text”甚至

for link in links: 
    text = link.text 
    if keyword1 in text and keyword2 in text and keyword3 in text: 
     print("LINK SCRAPED") 
     print(text, "link scraped") 

编辑:离开循环时,发现链接

keyword1 = "properlbc" 
keyword2 = "products" 
keyword3 = "bb1296" 

found = False 

while not found: 
    #sleep(randint(4,6)) 
    r = requests.get(sitemap) 
    soup = BeautifulSoup(r.text, 'lxml') 
    links = soup.find_all('loc') 
    for link in links: 
     text = link.text 
     if keyword1 in text and keyword2 in text and keyword3 in text: 
      print("LINK SCRAPED") 
      print(text, "link scraped") 
      found = True # to leave `while` loop 
      break # to leave `for` loop 
您使用`while`和`continue`跳过关键字链接,以便它不打印
+0

是的,但会循环,直到链接添加到网站? – ColeWorld

+0

检查链接是否被添加到一边,你必须再次阅读页面。仅循环链接是无用的。 – furas

+0

找到链接时可以使用'found = False'和'while not found:'而不是'while while'来退出循环。然后设置'found = True',如果关键字1 ...' – furas

相关问题