解析具有给定内容的元素之后的HTML元素

我试图从具有特定内容“ID”的元素之后的HTML元素中提取内容。解析具有给定内容的元素之后的HTML元素

例如，在下面的数据提示属性的内容中，我想从所有情况下的ID标签之后的元素中提取内容1886G。

我在python中使用beautifulsoup4来解析，一旦识别出基于id的内容，另一个解析数据提示内容字符串回html。我试图用findNextSibling（）像这样抢ID：

import os 
import re 
from bs4 import BeautifulSoup 


html_file = BeautifulSoup(open("data_sample.html"), "html.parser") 

for tag in html_file.findAll(id = re.compile("^content.*")): 
    dataTip = BeautifulSoup(tag["data-tip"], "html.parser") 
    print("find ID:") 
    print(dataTip.findNextSibling("tr", attrs = {"th" : "ID"}))

输出是

find ID: 
None

下面是一个例子元素：

<div id="content_placement_o_89879879789" style="z-index: 77; position: absolute; width: 25px; height: 43px; left: 124.0px; top: 344.0px;" data-tip="<table width='200'> 
<tr> 
<th>Name</th> 
<td>Generic Phone Name</td> 
</tr> 
<tr> 
<th>ID</th> 
<td>1886G</td> 
</tr> 
<tr> 
<th>Status</th> 
<td>Same</td> 
</tr> 
</table> 
"> 
<img alt="Image" class="same_mark_10987024 same_mark_highlighted" height="43" id="s_o_848483938748" src="https://website/picture.gif" style="position: absolute" width="25"> 
</div>

显然我失去了一些东西关于这个功能如何工作。有谁知道我可以改变来完成这项任务吗？

来源

2017-03-16 Sledge

你需要调用findNextSibling在th标签，其文本是ID而不是tr其中有你正在努力寻找，或者更明确的标签，th和td是tr而th儿童父子关系和td是彼此的兄弟姐妹：

import re 
for tag in html_file.findAll(id = re.compile("^content.*")): 
    dataTip = BeautifulSoup(tag["data-tip"], "html.parser") 
    id = dataTip.find("th", text = "ID").findNextSibling().text 
    print(id) 

# 1886G

来源

2017-03-16 19:41:16 Psidom

我现在看到，这正是我所期待的。 – Sledge

解析具有给定内容的元素之后的HTML元素

回答

相关问题