检索python中第一个词条的城市词典结果

我已经写了一个非常简单的代码来获得urbandictionary.com上任何词条的第一个结果。我从写一个简单的东西开始，看看他们的代码是如何格式化的。检索python中第一个词条的城市词典结果

def parseudtest(searchurl):  
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl 
    url_info = urllib.urlopen(url) 
    for lines in url_info: 
     print lines

对于测试，我搜索'cats'，并且使用了作为可变searchurl。我收到的输出当然是一个巨大的页面，但这里是我关心的部分约：

<meta content='He set us up the bomb. Also took all our base.' name='Description' /> 

<meta content='He set us up the bomb. Also took all our base.' property='og:description' /> 

<meta content='cats' property='og:title' /> 

<meta content="http://static3.urbandictionary.com/rel-1e0b481/images/og_image.png" property="og:image" /> 

<meta content='Urban Dictionary' property='og:site_name' />

正如你所看到的，在第一时间元素“中继内容”出现在网站上，这是搜索词的第一个定义。因此，我编写了以下代码以检索它：

def parseud(searchurl):  
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl 
    url_info = urllib.urlopen(url) 
    if (url_info): 
     xmldoc = minidom.parse(url_info) 
    if (xmldoc): 
     definition = xmldoc.getElementsByTagName('meta content')[0].firstChild.data 
     print definition

由于某些原因，解析似乎无法正常工作，并且每次都会遇到错误。这是特别令人困惑，因为该网站似乎使用基本上相同的格式，我成功地从其他网站检索到特定的数据。如果有人能帮我弄清楚我在这里搞砸了什么，那将不胜感激。

来源

2012-02-13 Jordan

由于您不会为发生的错误提供回溯，所以很难具体说明，但我认为尽管网站声称是XHTML，但它并不是真正有效的XML。您最好使用Beautiful Soup，因为它是专为解析HTML而设计的，并且会正确处理破损的标记。

来源

2012-02-13 09:38:57

我从来没有使用过minidom命名解析器，但我认为这个问题是您致电：

xmldoc.getElementsByTagName('meta content')

而塔标记名称是meta，content只是第一属性（如图所示很好由高亮你的html代码）。

尝试更换该位有：

xmldoc.getElementsByTagName('meta')

来源

2012-02-13 09:41:18

你的答案是绝对正确的，但即使我用它不会工作正确的标签名。问题在于该页面无效的XML，所以我下载并实施了美丽的汤，做我现在想要的。 – Jordan 2012-02-13 11:45:54

@Jordan：使用BeautifulSoup是一个不错的选择:) – 2012-02-13 11:53:58

检索python中第一个词条的城市词典结果

回答

相关问题