2017-09-05 75 views
0

解析在Python 3字符串搜索词,当我想用​​我感兴趣的术语只返回字符串,我可以这样做:Python中没有找到由BeautifulSoup

phrases = ["1. The cat was sleeping", 
     "2. The dog jumped over the cat", 
     "3. The cat was startled"] 

for phrase in phrases: 
    if "dog" in phrase: 
     print(phrase) 

这当然版画“2 。狗跳过猫“

现在我想要做的是使相同的概念与分析字符串在BeautifulSoup工作。例如,Craigslist拥有大量的A标签,但只有A标签中有“hdrlnk”的标签对我们很有帮助。所以我:

import requests 
from bs4 import BeautifulSoup 

url = "https://chicago.craigslist.org/search/apa" 
r = requests.get(url) 

soup = BeautifulSoup(r.content, "html.parser") 
links = soup.find_all("a") 

for link in links: 
    if "hdrlnk" in link: 
     print(link) 

问题是不是打印所有A标签与“hdrlnk”里面,Python的打印什么。我不确定发生了什么问题。

+0

我访问了链接,但无法找到包含“hdrlink”的文本的任何链接。 –

回答

4

“hdrlnk” 是链接一个类属性。正如你说你是只有在这些环节感兴趣的只是找到基于类像这样的链接:

import requests 
from bs4 import BeautifulSoup 

url = "https://chicago.craigslist.org/search/apa" 
r = requests.get(url) 

soup = BeautifulSoup(r.content, "html.parser") 
links = soup.find_all("a", {"class": "hdrlnk"}) 

for link in links: 
    print(link) 

输出:

<a class="result-title hdrlnk" data-id="6293679332" href="/chc/apa/d/high-rise-2-bedroom-heated/6293679332.html">High-Rise 2 Bedroom Heated Pool Indoor Parking Fire Pit Pet Friendly!</a> 
<a class="result-title hdrlnk" data-id="6285069993" href="/chc/apa/d/new-beautiful-studio-in/6285069993.html">NEW-Beautiful Studio in Uptown/free heat</a> 
<a class="result-title hdrlnk" data-id="6293694090" href="/chc/apa/d/albany-park-2-bed-1-bath/6293694090.html">Albany Park 2 Bed 1 Bath Dishwasher W/D &amp; Heat + Parking Incl Pets ok</a> 
<a class="result-title hdrlnk" data-id="6282289498" href="/chc/apa/d/north-center-2-bed-1-bath/6282289498.html">NORTH CENTER: 2 BED 1 BATH HDWD AC UNITS PROVIDE W/D ON SITE PRK INCLU</a> 
<a class="result-title hdrlnk" data-id="6266583119" href="/chc/apa/d/beautiful-2bed-1bath-in-the/6266583119.html">Beautiful 2bed/1bath in the heart of Wrigleyville</a> 
<a class="result-title hdrlnk" data-id="6286352598" href="/chc/apa/d/newly-rehabbed-2-bedroom-unit/6286352598.html">Newly Rehabbed 2 Bedroom Unit! Section 8 OK! Pets OK! (NHQ)</a> 

要获得链接的href或文字用途:

print(link["href"]) 
print(link.text) 
0

尝试:

for link in links: 
    if "hdrlnk" in link["href"]: 
     print(link) 
0

只是在链接内容的搜索词,否则你的代码似乎很好

import requests 
from bs4 import BeautifulSoup 

url = "https://chicago.craigslist.org/search/apa" 
r = requests.get(url) 

soup = BeautifulSoup(r.content, "html.parser") 
links = soup.find_all("a") 

for link in links: 
    if "hdrlnk" in link.contents[0]: 
     print(link) 

或者,如果你想里面的href或标题进行搜索,使用link['href']link['title']

0

为了得到所需的链接,就可以使用脚本中选择,使刮板强大和简洁。

import requests 
from bs4 import BeautifulSoup 

base_link = "https://chicago.craigslist.org" 
res = requests.get("https://chicago.craigslist.org/search/apa").text 
soup = BeautifulSoup(res, "lxml") 
for link in soup.select(".hdrlnk"): 
    print(base_link + link.get("href"))