Beautifulsoup功能在特定的senario中无法正常工作

我正尝试使用urllib2在以下URL中读取：http://frcwest.com/然后搜索数据以找到元重定向。Beautifulsoup功能在特定的senario中无法正常工作

它读取以下数据：

<!--?xml version="1.0" encoding="UTF-8"?--><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
    <html xmlns="http://www.w3.org/1999/xhtml"><head><title></title><meta content="0;url= Home.html" http-equiv="refresh"/></head><body></body></html>

读入Beautifulsoup工作正常。然而由于某些原因，没有任何功能适用于这种特定的危险，我不明白为什么。 Beautifulsoup在所有其他情况下都非常适合我。但是，在简单尝试时：

soup.findAll('meta')

不产生任何结果。

我的最终目标是运行：

soup.find("meta",attrs={"http-equiv":"refresh"})

但如果：

soup.findAll('meta')

甚至没有工作，然后我卡。任何煽动这个谜，将不胜感激，谢谢！

来源

2013-04-21 bmiskie

什么版本Beautifulsoup您使用的是？使用'导入请求;从bs4导入BeautifulSoup; BeautifulSoup（requests.get（your_url））。find_all（'meta'）'对我来说工作正常.. – 2013-04-21 18:23:52

嘿乔恩，感谢您的快速回复。我正在使用bs4。但是要导入，解析我使用httplib2和html5lib的数据。根据你的回应和Martijn的回应，看起来这是错误的根源。看来你正在使用请求库（来自python-requests.org）来使它工作。我不知道这些资源，我会继续玩下去，谢谢！ – bmiskie 2013-04-21 18:40:41

这是在这里抛出解析器的注释和doctype，随后是BeautifulSoup。

即使HTML标签似乎 '水涨船高'：

>>> soup.find('html') is None 
True

但它的存在在.contents迭代依然。你可以再次找到的东西：

for elem in soup: 
    if getattr(elem, 'name', None) == u'html': 
     soup = elem 
     break 

soup.find_all('meta')

演示：

>>> for elem in soup: 
...  if getattr(elem, 'name', None) == u'html': 
...   soup = elem 
...   break 
... 
>>> soup.find_all('meta') 
[<meta content="0;url= Home.html" http-equiv="refresh"/>]

来源

2013-04-21 18:25:08

感谢您的煽动和评论，神秘解决！感谢这个明确而迅速的回应，我几天来一直在反驳这个问题。 – bmiskie 2013-04-21 18:41:38

Beautifulsoup功能在特定的senario中无法正常工作

回答

相关问题