在Python中运行基本Web刮时出现索引错误

我正在使用Python 2.7。当我尝试运行此代码时，当函数命中打印findPatTitle [i]时，出现问题，并且python返回“索引错误：列表索引超出范围”。我从YouTube上的第13个python教程中获取这些代码，并且我很确定代码是相同的，所以我不明白为什么我会遇到范围问题。有任何想法吗？在Python中运行基本Web刮时出现索引错误

from urllib import urlopen 
from BeautifulSoup import BeautifulSoup 
import re 

webpage = urlopen('http://feeds.huffingtonpost.com/huffingtonpost/LatestNews').read() 

patFinderTitle = re.compile('<title>(.*)<title>') 

patFinderLink = re.compile('<link rel.*href="(.*)" />') 

findPatTitle = re.findall(patFinderTitle,webpage) 
findPatLink = re.findall(patFinderLink,webpage) 

listIterator = [] 
listIterator[:] = range(2,16) 

for i in listIterator: 
    print findPatTitle[i] 
    print findPatLink[i] 
    print "\n"

来源

2011-09-06 Burton Guster

为什么你使用正则表达式来解析HTML，当你有BeautifulSoup？ o.O你不应该用正则表达式解析html ... http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not – naeg

如果你正则表达式设法找出标题和链接标签，你会得到一个匹配的字符串，当使用findall的列表。在这种情况下，您可以遍历它们并打印出来。

像：

for title in findPatTitle: 
    print title 

for link in findPatLink: 
    print link

你所得到的指数误差是因为你正试图从2访问元素的列表，以16中都没有在任何标题或链接16个元素。

注意，listIterator[:] = range(2,16)不是为此目的编写代码的好方法。你可以使用

for i in range(2, 16) 
    # use i

来源

2011-09-06 03:35:25

感谢您的提示。我的代码有问题，findPatTitle应该是（。*）。对于那个很抱歉。 –

在Python中运行基本Web刮时出现索引错误

回答

相关问题