-2
我创建了一个网站刮板将刮去黄页所有信息(用于教育目的)为什么它跳过整个循环?
def actual_yellow_pages_scrape(link,no,dir,gui,sel,ypfind,terminal,user,password,port,type):
print(link,no,dir,gui,sel,ypfind,terminal,user,password,port,type)
r = requests.get(link,headers=REQUEST_HEADERS)
soup = BeautifulSoup(r.content,"html.parser")
workbook = xlwt.Workbook()
sheet = workbook.add_sheet(str(ypfind))
count = 0
for i in soup.find_all(class_="business-name"):
sheet.write(count,0,str(i.text))
sheet.write(count,1,str("http://www.yellowpages.com"+i.get("href")))
r1 = requests.get("http://www.yellowpages.com"+i.get("href"))
soup1 = BeautifulSoup(r1.content,"html.parser")
website = soup1.find("a",class_="custom-link")
try:
print("Acquiring Website")
sheet.write(count,2,str(website.get("href")))
except:
sheet.write(count,2,str("None"))
email = soup1.find("a",class_="email-business")
try:
print(email.get("href"))
EMAIL = re.sub("mailto:","",str(email.get("href")))
sheet.write(count,3,str(EMAIL))
except:
sheet.write(count,3,str("None"))
phonetemp = soup1.find("div",class_="contact")
try:
phone = phonetemp.find("p")
print(phone.text)
sheet.write(count,4,str(phone.text))
except:
sheet.write(count,4,str("None"))
reviews = soup1.find(class_="count")
try:
print(reviews.text)
sheet.write(count,5,str(reviews.text))
except:
sheet.write(count,5,str("None"))
count+=1
save = dir+"\\"+ypfind+str(no)+".xls"
workbook.save(save)
no+=1
for i in soup.find_all("a",class_="next ajax-page"):
print(i.get("href"))
actual_yellow_pages_scrape("http://www.yellowpages.com"+str(i.get("href")),no,dir,gui,sel,ypfind,terminal,user,password,port,type)
的代码是我的刮板的上面部分。我在汤和for循环中创建了断点,甚至没有执行for循环的单行。没有错误抛出。我试着打印1-10的数字,但它不工作,为什么?
谢谢
可能因为'find_all'的结果是空的?你有没有检查过它? – Julien
因为你迭代的内容可能是空的。 –
使用'print()'来查看变量中的含义。 – furas