0
我尝试下面解析结构成套旅程选项,这样我可以找出所有可能的方式获得距离Pontypridd到兰戈伦和背部。的XPath:选择以下所有节点,直到某个节点
使用XPath,我可以做//div[@class='JourneyOptions']
来选择实际包含旅程信息的所有行。在XPath之外,我可以遍历每一行来决定是否应将其添加到一组旅程中,或者它是否是一组新旅程中的第一个。
在下面的示例中,所有的旅程集将包含两个行程,而是一组可以包含仅一个旅程(“直接”旅程),或两个以上的(一个以上的“连接”)。
是否有XPath表达式来选择第一个出站集的所有行程,第二个出站集的所有行程等等?
每个集合中的第一个旅程都有一个带有整数值的无线电输入。我可以动态生成这些来标记每个集合,但是需要知道何时停止生成(或者只是等待XPath失败)。
<div class='TableHolder'>
<p>...</p>
<h2 id='DirectionHeader'>Outbound Options</h2>
<p>Pontypridd to Llangollen, 30/11/1910</p>
<!-- first part of the first journey from Pontypridd to Llangollen -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ColumnOne'>
<input type='radio' checked='checked' name='out' value='1'>
</div>
... some more divs of parseable journey info ...
</div>
<div>
<!-- second part of the first journey from Pontypridd to Llangollen -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ConnectingJournies'>
<p>...</p>
</div>
<div class='ColumnOne'>
... doesn't contain a radio input ...
</div>
... some more divs of parseable journey info ...
</div>
</div>
<!-- first part of the second journey from Pontypridd to Llangollen -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ColumnOne'>
<input type='radio' name='out' value='2'>
</div>
... some more divs of parseable journey info ...
</div>
<div>
<!-- second part of the second journey from Pontypridd to Llangollen -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ConnectingJournies'>
<p>...</p>
</div>
<div class='ColumnOne'>
... doesn't contain a radio input ...
</div>
... some more divs of parseable journey info ...
</div>
</div>
... some more outbound journey options ...
<p>...</p>
<h2 id='DirectionHeader'>Inbound Options</h2>
<p>Llangollen to Pontypridd, 07/11/1910</p>
<!-- first part of the first journey from Llangollen to Pontypridd -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ColumnOne'>
<input type='radio' checked='checked' name='in' value='1'>
</div>
... some more divs of parseable journey info ...
</div>
<div>
<!-- second part of the first journey from Llangollen to Pontypridd -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ConnectingJournies'>
<p>...</p>
</div>
<div class='ColumnOne'>
... doesn't contain a radio input ...
</div>
... some more divs of parseable journey info ...
</div>
</div>
<!-- first part of the second journey from Llangollen to Pontypridd -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ColumnOne'>
<input type='radio' name='in' value='2'>
</div>
... some more divs of parseable journey info ...
</div>
<div>
<!-- second part of the second journey from Llangollen to Pontypridd -->
<div class='JourneyOptions'>
<div class='Journey'>
<div class='ConnectingJournies'>
<p>...</p>
</div>
<div class='ColumnOne'>
... doesn't contain a radio input ...
</div>
... some more divs of parseable journey info ...
</div>
</div>
... some more inbound journey options ...
</div>
很抱歉的大例子,但我认为这是小我可以把它同时仍然代表我的问题。