如果你想在列表中有印有“码”,只需使用一个"list comprehension":
def parse(self, response):
sourceHtml = BeautifulSoup(response.body)
soup = sourceHtml.find("dl", {"id": "resultList"})
return [link.get('code') for link in soup.find_all('dd')]
您还可以改善你的定位元素的方式,并使用CSS selector:
def parse(self, response):
soup = BeautifulSoup(response.body)
return [link.get('code') for link in soup.select('dl#resultList dd')]
这也是一个好主意,provide an underlying parser explicitly:
soup = BeautifulSoup(response.body, "html.parser")
# or soup = BeautifulSoup(response.body, "html5lib")
# or soup = BeautifulSoup(response.body, "lxml")