检索一个HTML标签的内容,我有以下的html代码:使用XPath
<div id="ipsLayout_contentArea">
<div class="preContentPadding">
<div id="ipsLayout_contentWrapper">
<div id="ipsLayout_mainArea">
<a id="elContent"></a>
<div class="cWidgetContainer " data-widgetarea="header" data-orientation="horizontal" data-role="widgetReceiver" data-controller="core.front.widgets.area">
<div class="ipsPageHeader ipsClearfix">
<div class="ipsClearfix">
<div class="cTopic ipsClear ipsSpacer_top" data-feedid="topic-100269" data-lastpage="" data-baseurl="https://forum.com/forum/topic/100269-topic/" data-autopoll="" data-controller="core.front.core.commentFeed,forums.front.topic.view">
<div class="" data-controller="core.front.core.moderation" data-role="commentFeed">
<form data-role="moderationTools" data-ipspageaction="" method="post" action="https://forum.com/forum/topic/100269-topic/?csrfKey=b092dccccee08fdbc06c26d350bf3c2b&do=multimodComment">
<a id="comment-626016"></a>
<article id="elComment_626016" class="cPost ipsBox ipsComment ipsComment_parent ipsClearfix ipsClear ipsColumns ipsColumns_noSpacing ipsColumns_collapsePhone " itemtype="http://schema.org/Comment" itemscope="">
<aside class="ipsComment_author cAuthorPane ipsColumn ipsColumn_medium">
<div class="ipsColumn ipsColumn_fluid">
<div id="comment-626016_wrap" class="ipsComment_content ipsType_medium ipsFaded_withHover" data-quotedata="{"userid":3859,"username":"Admin","timestamp":1453221383,"contentapp":"forums","contenttype":"forums","contentid":100269,"contentclass":"forums_Topic","contentcommentid":626016}" data-commentid="626016" data-commenttype="forums" data-commentapp="forums" data-controller="core.front.core.comment">
<div class="ipsComment_meta ipsType_light">
<div class="cPost_contentWrap ipsPad">
<div class="ipsType_normal ipsType_richText ipsContained" data-controller="core.front.core.lightboxedImages" itemprop="text" data-role="commentContent">
<p> Hi, </p>
<p> </p>
<p> This is a post with multiple </p>
<p> lines of text </p>
和我的帖子试图让的内容(明文)。我目前使用XPath:
//div[@id='ipsLayout_contentArea']/div[2]/div/div[4]/div/form/article/div/div/div[2]/div//text()
检索每个岗位的每一行(由<p></p>
作为分隔)。我怎样才能得到这个职位的全部内容(内:
<div class="ipsType_normal ipsType_richText ipsContained" data-controller="core.front.core.lightboxedImages" itemprop="text" data-role="commentContent"> Post content </div>),
明文(使<p></p>
被视为文本(以及其他标签的信息可能包括))
?编辑:
我使用以下XPath:
//div[@id='ipsLayout_contentArea']/div[2]/div/div[4]/div/form/article/div/div/div[2]/div
检索包含宝的div ST。
// forumTemplate.getXpathElements().get(forumTemplate.XPATH_GET_THREAD_POSTS) = //div[@id='ipsLayout_contentArea']/div[2]/div/div[4]/div/form/article/div/div/div[2]/div
List<DomNode> posts = (List<DomNode>) firstPage.getByXPath(forumTemplate.getXpathElements().get(forumTemplate.XPATH_GET_THREAD_POSTS));
for (DomNode post : posts) {
// Retrieve the contents of the post as a string
String postContentStr = post.getNodeValue();
变量postContentStr
始终为空。为什么?
这不能在XPath中完成。让你的XPath选择'div'并从java中获取'div'的内容作为文本(虽然不能帮助java部分) – har07
我可以将div作为一个dom节点,但无法获取其值(它的所有标签)。 – Sebi