使用Google App Engine的RSS源聚合器 - Python

我正在尝试构建一个GAE应用程序，用于处理RSS源并将来自该源的所有数据存储到Google数据存储区中。我使用Minidom从RSS提要中提取内容。我也尝试使用Feedparser和BeautifulSoup，但他们没有为我工作。使用Google App Engine的RSS源聚合器 - Python

我的应用程序当前在我的本地机器上解析该提要并将其保存在Google数据存储中约25秒。我上传了应用程序，当我尝试使用它时，我收到了“DeadLine Exceeded Error”。

我想知道是否有任何可能的方法来加快这个过程？随着时间的推移，我使用的饲料最终会增长超过100种。

来源

2010-02-08 A_iyer

它不应该接近那么长。这里是你如何使用Universal Feed Parser。

# easy_install feedparser

，并用它的一个例子：

import feedparser 

feed = 'http://stackoverflow.com/feeds/tag?tagnames=python&sort=newest' 
d = feedparser.parse(feed) 
for entry in d['entries']: 
    print entry.title

文档显示你如何把其他的东西了饲料。如果您有特定问题，请发布详细信息。

来源

2010-02-08 20:43:42 DisplacedAussie

感谢您的答复DisplacedAussie。 Feedparser遇到的一个问题是我无法获取标签的属性。你能告诉我该怎么做吗？ – 2010-02-08 21:43:43

您无法访问哪些属性？ http://www.feedparser.org/docs/index.html – DisplacedAussie 2010-02-08 22:12:17

我找到了解决此问题的方法，但我不确定这是否是最佳解决方案。

而不是Minidom我已经使用cElementTree解析RSS提要。我在一个单独的任务中处理每个“项目”标签及其子项，并将这些任务添加到任务队列中。

这帮助我避免了DeadlineExceededError。虽然我得到了“这个资源使用了很多CPU资源”警告。

关于如何避免警告的任何想法？

A_iyer

来源

2010-02-12 17:17:40

我有一个GAE的RSS阅读器演示/原型使用Feedparser工作 - http://deliciourss.appspot.com/。以下是一些代码 -

取回您的Feed。

data = urlfetch.fetch(feedUrl)

解析与Feedparser

parsedData = feedparser.parse(data.content)

更改饲料

# set main section to description if empty 
    for ix in range(len(parsedData.entries)): 
     bItem = 0 
     if hasattr(parsedData.entries[ix],'content'): 
      for item in parsedData.entries[ix].content: 
       if item.value: 
        bItem = 1 
        break 
      if bItem == 0: 
       parsedData.entries[ix].content[0].value = parsedData.entries[ix].summary 
     else: 
      parsedData.entries[ix].content = [{'value':parsedData.entries[ix].summary}]

模板的某些功能，如果你正在使用Django/webapp的

<?xml version="1.0" encoding="utf-8"?> 
<channel> 
<title>{{parsedData.channel.title}}</title> 
<url>{{feedUrl}}</url> 
<id>{{parsedData.channel.id}}</id> 
<updated>{{parsedData.channel.updated}}</updated> 
{% for entry in parsedData.entries %} 
<item> 
     <id>{{entry.id}}</id> 
     <title>{{entry.title}}</title> 
     <link> 
     {% for link in entry.links %} 
       {% ifequal link.rel "alternate" %} 
         {{link.href|escape}} 
       {% endifequal %} 
     {% endfor %} 
     </link> 
     <author>{{entry.author_detail.name}}</author> 
     <pubDate>{{entry.published}}</pubDate> 
     <description>{{entry.summary|escape}}</description> 
     {% for item in entry.content %} 
      {% if item.value %} 
       <content>{{item.value|escape}}</content> 
      {% endif %} 
     {% endfor %} 
</item>{% endfor %} 
</channel>

来源

2013-12-27 02:37:11 Randall

使用Google App Engine的RSS源聚合器 - Python

回答

相关问题