我试图从Zillow收集数据时一直没有成功。从Zillow刮取数据的最佳方式是什么?
例子:
url = https://www.zillow.com/homes/for_sale/Los-Angeles-CA_rb/?fromHomePage=true&shouldFireSellPageImplicitClaimGA=false&fromHomePageTab=buy
我想拉像地址信息,价格,zestimates,从在洛杉矶家中的所有位置。
我已经尝试过使用像BeautifulSoup这样的软件包进行HTML抓取。我也尝试过使用json。我几乎肯定Zillow的API不会有帮助。我的理解是,API最适合收集特定资产的信息。
我已经能够从其他网站获取信息,但似乎Zillow使用动态ID(每刷新一次),使访问信息变得更加困难。
UPDATE: 使用下面的代码试过,但我仍然没有产生任何结果
import requests
from bs4 import BeautifulSoup
url = 'https://www.zillow.com/homes/for_sale/Los-Angeles-CA_rb/?fromHomePage=true&shouldFireSellPageImplicitClaimGA=false&fromHomePageTab=buy'
page = requests.get(url)
data = page.content
soup = BeautifulSoup(data, 'html.parser')
for li in soup.find_all('div', {'class': 'zsg-photo-card-caption'}):
try:
#There is sponsored links in the list. You might need to take care
#of that
#Better check for null values which we are not doing in here
print(li.find('span', {'class': 'zsg-photo-card-price'}).text)
print(li.find('span', {'class': 'zsg-photo-card-info'}).text)
print(li.find('span', {'class': 'zsg-photo-card-address'}).text)
print(li.find('span', {'class': 'zsg-photo-card-broker-name'}).text)
except :
print('An error occured')
HTTPS ://www.zillow.com/howto/api/APIOverview.htm –
已经签出API,并不完全给我我需要的东西。 –
您可能会发现这是因为Zillow的API使用条款(以及该网站)特别禁止刮擦。 – toonarmycaptain