使用XPath

检索一个HTML标签的内容，我有以下的html代码：使用XPath

<div id="ipsLayout_contentArea"> 
<div class="preContentPadding"> 
<div id="ipsLayout_contentWrapper"> 
<div id="ipsLayout_mainArea"> 
<a id="elContent"></a> 
<div class="cWidgetContainer " data-widgetarea="header" data-orientation="horizontal" data-role="widgetReceiver" data-controller="core.front.widgets.area"> 
<div class="ipsPageHeader ipsClearfix"> 
<div class="ipsClearfix"> 
<div class="cTopic ipsClear ipsSpacer_top" data-feedid="topic-100269" data-lastpage="" data-baseurl="https://forum.com/forum/topic/100269-topic/" data-autopoll="" data-controller="core.front.core.commentFeed,forums.front.topic.view"> 
<div class="" data-controller="core.front.core.moderation" data-role="commentFeed"> 
<form data-role="moderationTools" data-ipspageaction="" method="post" action="https://forum.com/forum/topic/100269-topic/?csrfKey=b092dccccee08fdbc06c26d350bf3c2b&do=multimodComment"> 
<a id="comment-626016"></a> 
<article id="elComment_626016" class="cPost ipsBox ipsComment ipsComment_parent ipsClearfix ipsClear ipsColumns ipsColumns_noSpacing ipsColumns_collapsePhone " itemtype="http://schema.org/Comment" itemscope=""> 
<aside class="ipsComment_author cAuthorPane ipsColumn ipsColumn_medium"> 
<div class="ipsColumn ipsColumn_fluid"> 
<div id="comment-626016_wrap" class="ipsComment_content ipsType_medium ipsFaded_withHover" data-quotedata="{"userid":3859,"username":"Admin","timestamp":1453221383,"contentapp":"forums","contenttype":"forums","contentid":100269,"contentclass":"forums_Topic","contentcommentid":626016}" data-commentid="626016" data-commenttype="forums" data-commentapp="forums" data-controller="core.front.core.comment"> 
<div class="ipsComment_meta ipsType_light"> 
<div class="cPost_contentWrap ipsPad"> 
<div class="ipsType_normal ipsType_richText ipsContained" data-controller="core.front.core.lightboxedImages" itemprop="text" data-role="commentContent"> 
<p> Hi, </p> 
<p> </p> 
<p> This is a post with multiple </p> 
<p> lines of text </p>

和我的帖子试图让的内容（明文）。我目前使用XPath：

//div[@id='ipsLayout_contentArea']/div[2]/div/div[4]/div/form/article/div/div/div[2]/div//text()

检索每个岗位的每一行（由作为分隔）。我怎样才能得到这个职位的全部内容（内：

<div class="ipsType_normal ipsType_richText ipsContained" data-controller="core.front.core.lightboxedImages" itemprop="text" data-role="commentContent"> Post content </div>),

明文（使被视为文本（以及其他标签的信息可能包括））

？编辑：

我使用以下XPath：

//div[@id='ipsLayout_contentArea']/div[2]/div/div[4]/div/form/article/div/div/div[2]/div

检索包含宝的div ST。

// forumTemplate.getXpathElements().get(forumTemplate.XPATH_GET_THREAD_POSTS) = //div[@id='ipsLayout_contentArea']/div[2]/div/div[4]/div/form/article/div/div/div[2]/div 
List<DomNode> posts = (List<DomNode>) firstPage.getByXPath(forumTemplate.getXpathElements().get(forumTemplate.XPATH_GET_THREAD_POSTS)); 
       for (DomNode post : posts) { 
        // Retrieve the contents of the post as a string 
        String postContentStr = post.getNodeValue();

变量postContentStr始终为空。为什么？

来源

2016-02-05 Sebi

这不能在XPath中完成。让你的XPath选择'div'并从java中获取'div'的内容作为文本（虽然不能帮助java部分） – har07

我可以将div作为一个dom节点，但无法获取其值（它的所有标签）。 – Sebi

您指定了//text()，它将递归地获取指定路径下的所有文本节点。根据你的使用，这可能会更好地工作：

//div[@data-role='commentContent']

这将匹配您试图获得的评论节点。如果你使用代码评估，你可以从这里开始。不匹配text()虽然，这将不符合任何标签。

来源

2016-02-05 08:46:49

我不想渲染它，只是它的明文内容（包含它可能包含的所有标签读为文本，是Java中的一个String）。文档是一个不是xml的html页面。 – Sebi

它是html，但它也是xml，因为您使用xpath处理它并构建一棵dom树。所以，就我所知，您正在从HTML中构建DOM树，然后匹配此DOM中的特定节点。现在，您正试图将DOM子树渲染回HTML。重点是，XPath不能在“文本”级别上工作，尽管我明白这是你最终的愿望。 –

回答

相关问题