2011-02-14 108 views
0

嗨全部 我有一个java字符串,我想要 1-删除所有的html标签,除了新的行标签<br></br>从它,并保留文本内的标签,如果有一个文本。 2-解析后的文本结果彼此连接如:text1andtext2,文本之间没有空格分隔,我也想这样做。从文本中删除除<br>以外的所有HTML标记?

这里是我在做什么:

String html = "<div dir=\"ltr\">hello my friend<span>ECHO</span><br>how are you ?<br><br><div class=\"gmail_quote\">On Mon, Feb 14, 2011 at 10:45 AM, My Friend <span dir=\"ltr\">&lt;<a href=\"mailto:[email protected]\">[email protected]</a>&gt;</span> wrote:<br> " 
      + "<blockquote class=\"gmail_quote\" style=\"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;\"> "; 
    String parsedText = html.replaceAll("\\<.*?\\>", ""); 
    System.out.println(parsedText); 

电流输出:

hello my friendECHOhow are you ?On Mon, Feb 14, 2011 at 10:45 AM, My Friend &lt;[email protected]&gt; wrote: 

所需的输出:

hello my friend ECHO <br> how are you ? <br> <br> On Mon, Feb 14, 2011 at 10:45 AM, My Friend &`lt;[email protected]&gt; wrote:` 
+0

可能重复:http://stackoverflow.com/questions/240546/removing-html-from-a-java-string – Simon 2011-02-14 09:07:17

+0

没了我不想删除所有的html标签,因为这实际上是代码的作用,我想删除除了新行标签之外的所有html标签。 – 2011-02-14 09:13:09

回答

4

你可以这样说:

final String html = 
    "<div dir=\"ltr\">hello my friend<span>ECHO</span><br>how are you ?" + 
    "<br><br><div class=\"gmail_quote\">On Mon, Feb 14, 2011 at 10:45 AM," + 
    " My Friend <span dir=\"ltr\">&lt;<a href=\"mailto:[email protected]" + 
    "main.com\">[email protected]</a>&gt;</span> wrote:<br><bloc" + 
    "kquote class=\"gmail_quote\" style=\"margin: 0pt 0pt 0pt 0.8ex; bord" + 
    "er-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;\"> "; 
final Pattern tagPattern = Pattern.compile("<([^\\s>/]+).*?>"); 
final Matcher matcher = tagPattern.matcher(html); 
final StringBuffer sb = new StringBuffer(html.length()); 
while(matcher.find()){ 
    matcher 
     .appendReplacement(sb, matcher.group(1).equalsIgnoreCase("br") 
      ? matcher.group() 
      : " "); 
} 
matcher.appendTail(sb); 

final String parsedText = sb.toString(); 
System.out.println(parsedText); 

输出:

hello my friendECHO<br>how are you ?<br><br>On Mon, Feb 14, 2011 at 10:45 AM, 
My Friend &lt;[email protected]&gt; wrote:<br> 

但是我希望你们知道,Cthulhu is calling if you do 。不要用正则表达式解析HTML/XML!

2

我会

  • 用换行符或其他特殊字符替换全部< br />。
  • 删除所有标签。
  • 替换为特殊字符< BR />
相关问题