python re.compile美丽的汤

-1

desc = re.compile('<ul class="descShort bullet">(.*)</ul>', re.DOTALL) 
findDesc = re.findall(desc, link_source) 

for i in findDesc: 
    print i 


''' 
<ul class="descShort bullet"> 

     Sleek and distinctive, these eye-catching ornaments will be the star of your holiday decor. These unique glass icicle ornaments are individually handcrafted by artisans in India. 

    </ul> 
'''

我试图提取ul类标记和/ ul之间的描述。我正在寻找使用REGEX的soltuion，以及beautifulsoup。python re.compile美丽的汤

来源

2011-11-27 phales15

我想至少你尝试* *使用HTML解析器......但不幸的你还在使用正则表达式来解析HTML。 –

我是这个网站的新手，我该如何去做呢？谢谢！ – phales15

查看[您的问题列表]（http://stackoverflow.com/users/1018129/aaron-phalen?tab=questions）;如果对他们中的任何人都有很好的答案，请点击旁边的勾号的轮廓。 – egor83

首先，用正则表达式解析HTML/XML通常被认为是a bad idea。因此，使用像BeautifulSoup这样的解析器确实是一个更好的主意。

你想可以做什么如下：

from BeautifulSoup import BeautifulSoup 

text = """ 
<ul class="descShort bullet">text1</ul> 
<a href="example.com">test</a> 
<ul class="descShort bullet">one more</ul> 
<ul class="other">text2</ul> 
""" 

soup = BeautifulSoup(text) 

# to get the contents of all <ul> tags: 
for tag in soup.findAll('ul'): 
    print tag.contents[0] 

# to get the contents of <ul> tags w/ attribute class="descShort bullet": 
for tag in soup.findAll('ul', {'class': 'descShort bullet'}): 
    print tag.contents[0]

来源

2011-11-27 21:21:48 egor83

python re.compile美丽的汤

回答

相关问题