2012-02-09 115 views
0

因此,在我的last question中,我询问了如何在RSS提要中解析XML中的链接。使用我从这里与额外的研究相结合收到的援助的想法,我能写了这个:从Python输出获取一行代码

def GetRSS(RSSurl): 
    url_info = urllib.urlopen(RSSurl) 
    if (url_info): 
     xmldoc = minidom.parse(url_info) 
    if (xmldoc): 
     channel = xmldoc.getElementsByTagName('channel') 
     for node in channel: 
      item = xmldoc.getElementsByTagName('item') 
      for node in item: 
       alist = xmldoc.getElementsByTagName('link') 
       for a in alist: 
        linktext = a.firstChild.data 
        print linktext 

正如我在其他问题中提到,我写了这个获得来自RSS feed on Redlettermedia.com的链接。代码工作正常,我收到的输出是:

http://redlettermedia.com 
http://redlettermedia.com/half-in-the-bag-b-fest-2012/ 
http://redlettermedia.com/an-update-from-red-letter-media/ 
http://redlettermedia.com/half-in-the-bag-red-tails/ 
http://redlettermedia.com/half-in-the-bag-the-devil-inside-and-flyin-ryan/ 
http://redlettermedia.com/newly-found-episode-iii-review-behind-the-scenes-footage/ 
http://redlettermedia.com/half-in-the-bag-the-girl-with-the-dragon-tattoo-and-2011-re-cap/ 
http://redlettermedia.com/mr-plinetts-indiana-jones-and-the-kingdom-of-the-crystal-skull-review/ 
http://redlettermedia.com/new-mr-plinkett-review-trailer/ 
http://redlettermedia.com/plinkett-fest/ 
http://redlettermedia.com/update/ 
http://redlettermedia.com 
http://redlettermedia.com/half-in-the-bag-b-fest-2012/ 
http://redlettermedia.com/an-update-from-red-letter-media/ 
http://redlettermedia.com/half-in-the-bag-red-tails/ 
http://redlettermedia.com/half-in-the-bag-the-devil-inside-and-flyin-ryan/ 
http://redlettermedia.com/newly-found-episode-iii-review-behind-the-scenes-footage/ 

依此类推。我现在要做的是仅打印最新的更新链接作为结果(这是输出中的第二行,在这种情况下为“http://redlettermedia.com/half-in-the-bag-b-fest-2012/”)。我将如何只打印该行?

+0

可以安装非STDLIB模块?你如何定义'最新的更新链接'? – Daenyth 2012-02-09 05:29:09

回答

1

如果它总是在列表中的第二项,你可以尝试

url = xmldoc.getElementsByTagName('link')[1].firstChild.data 
print url 
+0

这项工作非常完美,除了我收到十行重复我正在尝试获取的网址。我该怎么做才能做到这一点,而不是只接收一次我想要的网址? – Jordan 2012-02-09 05:41:03

+0

这是因为您要为列表中的所有项目打印它。你很可能会用'我的建议'来替换'for node in item:'后的内容,但我目前无法测试... – timc 2012-02-09 05:44:31

+0

嗯,我想这就是我应该做的,实际上。我完全用你提出的建议替换了'for node in item:'的所有内容,但由于某种原因,我似乎仍然得到了十行。 – Jordan 2012-02-09 06:05:56