需要关于网页抓取中的字符串匹配的帮助，python

我尝试从网页中提取一些东西。并且首先，我用BeautifulSoup提取一个名为“得分”的DIV，其中包括几个相似图片需要关于网页抓取中的字符串匹配的帮助，python

<img class="sprite-rating_s_fill rating_s_fill s45" src="http://e2.tacdn.com/img2/x.gif" alt="4.5 of 5 stars">

我要提取的比分在此图像中，这种情况下，它是“4.5”。所以我尝试做了这种方式：

pattern = re.compile('<img.*?alt="(.*?) of 5 stars">', re.S) 
items = re.findall(pattern, scores)

但它不工作。我是新来的网络抓取，所以任何人都可以帮助我呢？

2015-04-05 dec

BeautifulSoup实际上可以很容易地从这样的标签中提取信息！假设scores是BeautifulSoup Tag对象（你可以阅读有关in their documentation），你想要做的是提取从标签src属性：

src = scores['src']

对于你刚才给的例子，src应该在u'4.5 out of 5 stars'。现在，你只需要剥离出' out of 5 stars'：

removeIndex = src.index(' out of 5 stars') 
score = src[:removeIndex]

而且你会留下的'4.5'一个score。（如果你想操纵它作为一个数字，你必须做score = float(score)。

2015-04-05 02:40:59

它的工作，非常感谢你。请你也提供一些关于我匹配字符串的方式的建议？我仍然想图为什么它是错的 – dec 2015-04-06 17:22:39

回答