我想从一些HTML中提取多个值,并且我认为XPath可能是实现此目的的理想方法。使用XPath从HTML获取多个值
我想这样做的什么是通过具有类data
则循环中的每个循环tr
得到我所需要的数据,是route_number
的a
内的文本(也在标题)和via
文本。
的HTML低于:
<tr class="data"><th class="route_number"><a href="/routes/west-midlands/B001v/?tab=" title="Dudley - Sedgley - Wolverhampton - Tettenhall Wood"><span class="route_number small_curvy">1</span></a></th>
<td class="main_and_via">
<a href="/routes/west-midlands/B001v/?tab=" title="Dudley - Sedgley - Wolverhampton - Tettenhall Wood">Dudley - Sedgley - Wolverhampton - Tettenhall Wood</a>
<span class="via"><strong>via</strong> Dudley Road and Tettenhall Road</span>
</td>
</tr><tr class="data"><th class="route_number"><a href="/routes/west-midlands/B002/?tab=" title="Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock/Maypole"><span class="route_number small_curvy">2</span></a></th>
<td class="main_and_via">
<a href="/routes/west-midlands/B002/?tab=" title="Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock/Maypole">Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock/Maypole</a>
<span class="via"><strong>via</strong> Yardley Wood Road</span>
</td>
</tr>
通过每个tr
然后有单独的查询循环的route number
,anchor text
和via text
理想或可将其与一个单一的XPath查询做些什么呢?
不会真的不仅仅是有什么不同使用getAttribute()和getElementsByClassName – runspired