2012-03-22 96 views
3

我有这样的结构,我的HTML文档中:线上缠绕纯HTML文本标签

<p> 
"<em>You</em> began the evening well, Charlotte," said Mrs.&nbsp;Bennet with civil   self–command to Miss Lucas. "<em>You</em> were Mr.&nbsp;Bingley's first choice." 
</p> 

但我需要我的“纯文本”,以在标签被wrappted,以便能够处理它:)

<p> 
    <text>"</text> 
    <em>You</em> 
    <text> began the evening well, Charlotte," said Mrs.&nbsp;Bennet with civil self–command to Miss Lucas. "</text> 
    <em>You</em> 
    <text> were Mr.&nbsp;Bingley's first choice."</text> 
</p> 

任何想法如何做到这一点?我已经看过tagsoup和jsoup,但我似乎没有办法轻松解决这个问题。也许使用一些奇特的正则表达式。

感谢

回答

5

这里有一个建议:

public static Node toTextElement(String str) { 
    Element e = new Element(Tag.valueOf("text"), ""); 
    e.appendText(str); 
    return e; 
} 

public static void replaceTextNodes(Node root) { 
    if (root instanceof TextNode) 
     root.replaceWith(toTextElement(((TextNode) root).text())); 
    else 
     for (Node child : root.childNodes()) 
      replaceTextNodes(child); 
} 

测试代码:

String html = "<p>\"<em>You</em> began the evening well, Charlotte,\" " + 
     "said Mrs.&nbsp;Bennet with civil self–command to Miss Lucas." + 
     " \"<em>You</em> were Mr.&nbsp;Bingley's first choice.\"</p>"; 

Document doc = Jsoup.parse(html); 

for (Node n : doc.body().children()) 
    replaceTextNodes(n); 

System.out.println(doc); 

输出:

<html> 
<head></head> 
<body> 
    <p> 
    <text> 
    &quot; 
    </text><em> 
    <text> 
    You 
    </text></em> 
    <text> 
    began the evening well, Charlotte,&quot; said Mrs.&nbsp;Bennet with civil self–command to Miss Lucas. &quot; 
    </text><em> 
    <text> 
    You 
    </text></em> 
    <text> 
    were Mr.&nbsp;Bingley's first choice.&quot; 
    </text></p> 
</body> 
</html> 
+0

完美的作品!谢谢!我实际上正在尝试使用它来使用绘画和绘制文本方法在画布上呈现html。这是一个很好的开始? :) – Richard 2012-03-23 00:31:46