-1
我使用python3.5.1和BeautifulSoup 我想用正则表达式来搜索特定链路刮网站使用正则表达式的特定链接: 我的代码:如何搜索在python
from bs4 import BeautifulSoup
import urllib.request
import re
r = urllib.request.urlopen('http://i.cantonfair.org.cn/en/expexhibitorlist.aspx?categoryno=404').read()
soup = BeautifulSoup(r,"html.parser")
links = soup.find_all("a", href=re.compile(r"ExpExhibitorList\.aspx\?categoryno=[0-9]+"))
linksfromcategories = ([link["href"] for link in links])
print(linksfromcategories)
我得到所有的类似链接
['/cn/ExpExhibitorList.aspx?categoryno=432', 'ExpExhibitorList.aspx?categoryno=432003']
但我不想
'/cn/ExpExhibitorList.aspx?categoryno=432'
待检索
为什么你不想要那个链接?它匹配你的正则表达式,所以你会得到它。请解释更多 –