2016-10-01 41 views
0

多个标签,我有以下HTML使用BeautifulSoup与同名

<g class="1581 sqw_sv5" style="cursor: pointer;"> 
<path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#ffffff" style="stroke-width: 3.6; stroke-opacity: 0.5; stroke-linecap: round; fill-opacity: 0;"> 
</path> 
<path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#f95a0b" style="stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;"> 
</path> 

我需要在第二路径获得“抚摸”的价值。我现在的代码只是从第一个路径中提取值。

我目前使用

shots = soup.find_all('g') 
for shot in shots: 
    print(shot.path['stroke']) 

返回#FFFFFF。我需要它返回#f95a0b

+0

是否总是第二路径? –

回答

1

这里是我的解决你的问题。我的回答是,它可能过于具体。这仅在style的值始终为"stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;"且在整个文档中仅存在一个此类path元素时才有效。

该解决方案背后的想法是通过查找包含所需属性的所需元素的唯一特性来快速缩小元素的范围。

` 
from bs4 import BeautifulSoup 

html = """"<g class="1581 sqw_sv5" style="cursor: pointer;"> 
<path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#ffffff" style="stroke-width: 3.6; stroke-opacity: 0.5; stroke-linecap: round; fill-opacity: 0;"> 
</path> 
<path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#f95a0b" style="stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;"> 
</path>""" 

soup = BeautifulSoup(html, "html.parser") 
# get the desired 'path' element using the 'style' that identifies it 
desired_element = soup.find("path", {"style" : "stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;"}) 
# get the attribute value from the extracted element 
desired_attribute = desired_element["stroke"] 
print (desired_attribute) 
# prints #f95a0b 
` 

如果这种做法是一个没有去,那么你可能需要使用BeautifulSoups的next_siblingfindNext方法。基本上寻找第一个路径元素,你正在用你的代码完成,然后从那里'跳'到下一个路径元素,它包含你所需要的。

FindNext中:Beautifulsoup - nextSibling

NEXT_SIBLING:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#next-sibling-and-previous-sibling

+0

谢谢。我无法使用你的第一个建议,但是next_sibling完美运作。 – bgrantham

+0

不客气。很高兴它对你有效。 –

2

您需要使用find_all先找到所有路径的然后提取最后一个:

h = """<g class="1581 sqw_sv5" style="cursor: pointer;"> 
<path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#ffffff" style="stroke-width: 3.6; stroke-opacity: 0.5; stroke-linecap: round; fill-opacity: 0;"> 
</path> 
<path d="M397.696,126.554C397.696,126.554,404.57504,140.2417375,404.57504,140.2417375" stroke="#f95a0b" style="stroke-width: 1.2; stroke-linecap: round; fill-opacity: 0;"> 
</path>""" 
soup = BeautifulSoup(h) 
shots = soup.find_all('g') 
for shot in shots: 
    print(shot.find_all("path", stroke=True)[-1]["stroke"] 

使用shot.path['stroke']等同于使用shot.find("path")['stroke'],因为这只会返回第一条路。

或者使用第n-的型还可以根据HTML的结构工作:

soup = BeautifulSoup(h) 
shots = soup.find_all('g') 
for shot in shots: 
    print(shot.select_one("path:nth-of-type(2)")["stroke"])