0
HTML文件可以从hereBeautifulSoup返回空使用find_all( “跨度”,文本= re.compile( “T”))
soup = BeautifulSoup(open(r"test.html"),from_encoding="ascii")
In [43]:soup.find_all("span")
Out[43]:
[<span style="position:absolute; border: gray 1px solid; left:0px; top:50px; width:648px; height:783px;"></span>,
<span style="font-family: LJOGFN+HelveticaNeueLTStd-Bd; font-size:7px">S
<br/></span>,
<span style="font-family: LJOGFN+HelveticaNeueLTStd-Bd; font-size:7px">T
<br/></span>,
<span style="font-family: LJOGFN+HelveticaNeueLTStd-Bd; font-size:8px">N
<br/></span>,
<span style="font-family: LJOGFN+HelveticaNeueLTStd-Bd; font-size:7px">E
<br/></span>,
<span style="font-family: LJOGFN+HelveticaNeueLTStd-Bd; font-size:7px">T
<br/></span>,
<span style="font-family: LJOGFN+HelveticaNeueLTStd-Bd; font-size:8px">N
<br/></span>]
In [44]:soup.find_all("span", text = re.compile("T"))
Out[44]:[]
下载为什么它返回空列表?这与编码有关吗?
更新:下面的代码工作:
In [87]:
def aa(tag):
return tag.name == "span" and re.match("T", tag.text)
In [88]:soup.find_all(aa)[0]
它是如何工作的这条路?
你可以做这样的'[我为我(“跨度”) i.text =='N']' – 2015-03-25 07:56:30
@AvinashRaj它不起作用,我只是更新了问题 – Sean 2015-03-25 15:16:20