2015-09-04 152 views
0

的具体兄弟我有一个XML页面的结构如下:查找元素

<address> 
<city>Anaheim</city> 
<state>California</state> 
<zip>92801</zip> 
<country>United States</country> 
</address> 

<address> 
<city>Berkley</city> 
<state>California</state> 
<zip>94705</zip> 
<country>United States</country> 
</address> 

我想获得这个城市的标签,其中拉链标记值满足条件的只有值。 比如我需要那些城市的名字,其中的zip = 92801。

是否有蟒蛇一个简单的方法来做到这一点?

+0

我会在BeautifulSoup解决方案特别感兴趣,因为我解析与该网站的其他部分。 –

回答

0

my_string = ''' 
<root> 
    <address> 
    <city>Anaheim</city> 
    <state>California</state> 
    <zip>92801</zip> 
    <country>United States</country> 
    </address> 
    <address> 
    <city>Berkley</city> 
    <state>California</state> 
    <zip>94705</zip> 
    <country>United States</country> 
    </address> 
</root> 
''' 

from bs4 import BeautifulSoup 
soup = BeautifulSoup(my_string, 'html.parser') 
desired_zips = soup.findAll('zip', text="92801") 
cities = [] 
for zip_tag in desired_zips: 
    cities.append(zip_tag.findPreviousSibling('city')) 

print(cities) 

输出:

[<city>Anaheim</city>] 

注意:你可以写这个for循环到一个列表理解,但它看起来笨重且无法读取。

+0

谢谢 - 制定出很好。 –

2

这会达到预期的效果:

my_string = ''' 
    <root> 
    <address> 
     <city>Anaheim</city> 
     <state>California</state> 
     <zip>92801</zip> 
     <country>United States</country> 
    </address> 
    <address> 
     <city>Berkley</city> 
     <state>California</state> 
     <zip>94705</zip> 
     <country>United States</country> 
    </address> 
    </root> 
''' 

from lxml import etree 

root = etree.fromstring(my_string) 
cities = root.xpath('.//zip[text()="92801"]/preceding-sibling::city') 
1

怎么样,如果你想使用,而不是美丽的汤使用ElementTree

import xml.etree.ElementTree as ET 
tree = ET.parse('country_data.xml') 
root = tree.getroot() 

filtered_addresses = [] 
for address in root.findall('address'): 
    if address.get('zip') == '92801': 
     filtered_addresses.append(address)