设置变量等于行，其中关键字发现

我无法打印正确的关键字在下面的代码中发现的链接：设置变量等于行，其中关键字发现

import urllib2 
from random import randint 
import time 
from lxml import etree 
from time import sleep 

a = requests.get('http://properlbc.com/sitemap.xml') 
#time.sleep(1) 
scrape = BeautifulSoup(a.text, 'lxml') 
linkz = scrape.find_all('loc') 
for linke in linkz: 
    if "products" in linke.text: 
     sitemap = str(linke.text) 
     break 



while True: 
# sleep(randint(4,6)) 
    keyword1 = "properlbc" 
    keyword2 = "products" 
    keyword3 = "bb1296" 
    r = requests.get(sitemap) 
# time.sleep(1) 
    soup = BeautifulSoup(r.text, 'lxml') 
    links = soup.find_all('loc') 
    for link in links: 
     while (keyword1 in link.text and keyword2 in link.text and keyword3 in link.text): 
      continue 
     print("LINK SCRAPED") 
     print(str(link.text) + "link scraped") 
     break

的代码是成功的循环，直到用关键字链接被发现但它不打印带有关键字的具体环节，它打印的，而不是“https://properlbc.com/collections/new-arrival/products/bb1296”

来源

2016-12-31 ColeWorld

。 – furas

你要做

for link in links: 
    if keyword1 in link.text and keyword2 in link.text and keyword3 in link.text: 
     print("LINK SCRAPED") 
     print(str(link.text) + "link scraped")

最初的“link.text”甚至

for link in links: 
    text = link.text 
    if keyword1 in text and keyword2 in text and keyword3 in text: 
     print("LINK SCRAPED") 
     print(text, "link scraped")

编辑：离开循环时，发现链接

keyword1 = "properlbc" 
keyword2 = "products" 
keyword3 = "bb1296" 

found = False 

while not found: 
    #sleep(randint(4,6)) 
    r = requests.get(sitemap) 
    soup = BeautifulSoup(r.text, 'lxml') 
    links = soup.find_all('loc') 
    for link in links: 
     text = link.text 
     if keyword1 in text and keyword2 in text and keyword3 in text: 
      print("LINK SCRAPED") 
      print(text, "link scraped") 
      found = True # to leave `while` loop 
      break # to leave `for` loop

您使用`while`和`continue`跳过关键字链接，以便它不打印

来源

2016-12-31 07:07:08 furas

是的，但会循环，直到链接添加到网站？ – ColeWorld

检查链接是否被添加到一边，你必须再次阅读页面。仅循环链接是无用的。 – furas

找到链接时可以使用'found = False'和'while not found：'而不是'while while'来退出循环。然后设置'found = True'，如果关键字1 ...' – furas

设置变量等于行，其中关键字发现

回答

相关问题