1
我在我的java应用程序中使用jsoup来解析html代码,但现在我需要解析表数据,并且我想获得第一个<td>
元素的第一个值,在<tr>
之后,如果第一个数据包含单词“过期”它将跳过,如果没有过期,它将解析到第三个表格,并以“.rpm”单词获得该值,并且无法使其工作。我尝试了很多方法,但都不成功,所以如果有人有经验,我想在这里尝试运气。在Java中使用jsoup的解析元素
public class rpms {
public static void getTdSibling(String sourceTd) throws FileNotFoundException, UnsupportedEncodingException {
String fragment = sourceTd;
Document doc = Jsoup.parseBodyFragment(fragment);
Elements myElements = doc.getElementsByClass("confluenceTable tablesorter").first().getElementsByTag("tr");
for (Element element : myElements) {
if (element.select("td").contains("Outdated")) {
String rpms = element.ownText();
System.out.println(rpms);
}
}
}
public static void main(String[] args) {
URLget rpms = new URLget();
try {
getTdSibling(sendGetRequest(URL).toString());
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
并请参阅下表中的HTML代码中元素的解析情况如下:
<table class="confluenceTable tablesorter">
<tbody class="">
<tr>
<td colspan="1" class="confluenceTd">RHSA-2014:1172</td>
<td colspan="1" class="confluenceTd">
<p>The procmail program is used for local mail delivery. In addition to just
<br>delivering mail, procmail can be used for automatic filtering, presorting,
<br>and other mail handling jobs.</p>
<p>A heap-based buffer overflow flaw was found in procmail's formail utility.
<br>A remote attacker could send an email with specially crafted headers that,
<br>when processed by formail, could cause procmail to crash or, possibly,
<br>execute arbitrary code as the user running formail. (CVE-2014-3618)
</p>
</td>
<td colspan="1" class="confluenceTd">procmail-3.22-17.1.2.x86_64.rpm</td>
<td colspan="1" class="confluenceTd">
<img class="emoticon emoticon-tick" src="/s/en_GB-1988229788/4733/f235dd088df5682b0560ab6fc66ed22c9124c0be.57/_/images/icons/emoticons/check.png" data-emoticon-name="tick" alt="(tick)">
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">Outdated RHSA-2014:1166</td>
<td colspan="1" class="confluenceTd">
<p>Jakarta Commons HTTPClient implements the client side of HTTP standards.</p>
<p>It was discovered that the HTTPClient incorrectly extracted host name from
<br>an X.509 certificate subject's Common Name (CN) field. A man-in-the-middle
<br>attacker could use this flaw to spoof an SSL server using a specially
<br>crafted X.509 certificate. (CVE-2014-3577)</p>
</td>
<td colspan="1" class="confluenceTd">
<p>jakarta-commons-httpclient-3.0-7jpp.4.el5_10.x86_64.rpm</p>
<p>jakarta-commons-httpclient-demo-3.0-7jpp.4.el5_10.x86_64.rpm</p>
<p>jakarta-commons-httpclient-javadoc-3.0-7jpp.4.el5_10.x86_64.rpm</p>
<p>jakarta-commons-httpclient-manual-3.0-7jpp.4.el5_10.x86_64.rpm</p>
</td>
</tr>
<tr>
<td colspan="1" class="confluenceTd">RHSA-2014:1148-1</td>
<td colspan="1" class="confluenceTd">
<p>A flaw was found in the way Squid handled malformed HTTP Range headers.
<br>A remote attacker able to send HTTP requests to the Squid proxy could use
<br>this flaw to crash Squid. (CVE-2014-3609)
</p>
<p>A buffer overflow flaw was found in Squid's DNS lookup module. A remote
<br>attacker able to send HTTP requests to the Squid proxy could use this flaw
<br>to crash Squid. (CVE-2013-4115)</p>
</td>
<td colspan="1" class="confluenceTd"><span>squid-2.6.STABLE21-7.el5_10.x86_64.rpm</span>
</td>
<td colspan="1" class="confluenceTd"></td>
</tr>
</table>
需要你的帮助。我已经尝试了很多次,并从这里阅读文章,但它不能。谢谢。
你可以修改这个元素'tds:element.getElementsByTag(“td”);'它是错误的。 – user3278908 2014-09-24 03:40:37
我的错字,抱歉。还有一个失踪的';' – yunandtidus 2014-09-24 07:37:19