查找兄弟元素的文本，其中原始元素与特定字符串匹配

我想从一堆html表格中提取一些数据价格。这些表格包含各种价格，当然表格数据标签不包含任何有用的东西。查找兄弟元素的文本，其中原始元素与特定字符串匹配

<div id="item-price-data"> 
    <table> 
    <tbody> 
     <tr> 
     <td class="some-class">Normal Price:</td> 
     <td class="another-class">$100.00</td> 
     </tr> 
     <tr> 
     <td class="some-class">Member Price:</td> 
     <td class="another-class">$90.00</td> 
     </tr> 
     <tr> 
     <td class="some-class">Sale Price:</td> 
     <td class="another-class">$80.00</td> 
     </tr> 
     <tr> 
     <td class="some-class">You save:</td> 
     <td class="another-class">$20.00</td> 
     </tr> 
    </tbody> 
    </table> 
</div>

我唯一关心的价格是那些与具有“正常价格”的元素配对的价格，因为它是文本。

我想要做的是扫描表的后代，找到包含该文本的<td>标签，然后从其兄弟中拉出文本。

我遇到的问题是，在BeautifulSoup descendants属性返回的列表NavigableString，而不是Tag。

所以，如果我这样做：

from bs4 import BeautifulSoup 
from urllib import request 

html = request.urlopen(url) 
soup = BeautifulSoup(html, 'lxml') 

div = soup.find('div', {'id': 'item-price-data'}) 
table_data = div.find_all('td') 

for element in table_data: 
    if element.get_text() == 'Normal Price:': 
     price = element.next_sibling 

print(price)

我什么也没得到。有没有简单的方法来获取字符串值？

来源

2016-02-12 Gree Tree Python

我只是跑这和我' $ 100.00';我错过了什么吗？ –

是的。有些事我也没有得到。我发现'Tag'在那里，但它不是下一个兄弟姐妹。下一个兄弟是回车。 –

可以使用find_next()方法还可能需要一点正则表达式：

演示：

>>> import re 
>>> from bs4 import BeautifulSoup 
>>> html = """<div id="item-price-data"> 
... <table> 
...  <tbody> 
...  <tr> 
...   <td class="some-class">Normal Price:</td> 
...   <td class="another-class">$100.00</td> 
...  </tr> 
...  <tr> 
...   <td class="some-class">Member Price:</td> 
...   <td class="another-class">$90.00</td> 
...  </tr> 
...  <tr> 
...   <td class="some-class">Sale Price:</td> 
...   <td class="another-class">$80.00</td> 
...  </tr> 
...  <tr> 
...   <td class="some-class">You save:</td> 
...   <td class="another-class">$20.00</td> 
...  </tr> 
...  </tbody> 
... </table> 
... </div>""" 
>>> soup = BeautifulSoup(html, 'lxml') 
>>> div = soup.find('div', {'id': 'item-price-data'}) 
>>> for element in div.find_all('td', text=re.compile('Normal Price')): 
...  price = element.find_next('td') 
...  print(price) 
... 
<td class="another-class">$100.00</td>

如果你不希望把正则表达式这个那么下面会为你工作。

>>> table_data = div.find_all('td') 
>>> for element in table_data: 
...  if 'Normal Price' in element.get_text(): 
...   price = element.find_next('td') 
...   print(price) 
... 
<td class="another-class">$100.00</td>

来源

2016-02-12 11:50:12 styvane

查找兄弟元素的文本，其中原始元素与特定字符串匹配

回答

相关问题