java：正则表达式

我有一个包含大量图像标记的Html字符串，我需要获取标记并对其进行更改。例如：java：正则表达式

String imageRegex = "(<img.+(src=\".+\").+/>){1}"; 
String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/comcom/7.gif\" smilieid=\"294\" border=\"0\" alt=\"\" />"; 
Matcher matcher = Pattern.compile(imageRegex, Pattern.CASE_INSENSITIVE).matcher(msg); 
int i = 0; 
while (matcher.find()) { 
    i++; 
    Log.i("TAG", matcher.group()); 
}

结果是：

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" />hello world<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

，但它不是我想要的，我想要的结果是

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" /> 
<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

有什么错我的正则表达式？

来源

2012-07-10 Mejonzhan

我能请您看看这个答案：http://stackoverflow.com/a/1732454/83109 – 2012-07-10 13:14:25

有什么不妥，虽然regexing出仅标签？ – 2012-07-10 13:20:32

是的，有。问题在于HTML不是常规语言，所以它不适合用正则表达式进行分析。有时候你可以让它工作在一个紧急状态（这可能是其中一种情况），但有点像用旧鞋钉钉子。它可能会完成工作，但它并不是真正的工具。 – 2012-07-10 13:23:50

尝试(<img)(.*?)(/>)，这应该做的伎俩，虽然是的，你不应该使用正则表达式来解析HTML，因为人们会反复告诉你。

我没有安装eclipse，但我有VS2010，这对我很有用。

 String imageRegex = "(<img)(.*?)(/>)"; 
     String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/comcom/7.gif\" smilieid=\"294\" border=\"0\" alt=\"\" />"; 
     System.Text.RegularExpressions.MatchCollection match = System.Text.RegularExpressions.Regex.Matches(str, imageRegex, System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
     StringBuilder sb = new StringBuilder(); 
     foreach (System.Text.RegularExpressions.Match m in match) 
     { 
      sb.AppendLine(m.Value); 
     } 
     System.Windows.MessageBox.Show(sb.ToString());

结果：

<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" /> 
<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />

来源

2012-07-10 13:19:38 GrayFox374

是的，它的工作原理;我更新的正则表达式 – Mejonzhan 2012-07-10 13:33:04

大卫·M是正确的，你真的不应该尝试这样做，但你的具体问题是，+量词在你的正则表达式是贪婪的，所以它会匹配可能匹配的最长的子字符串。

有关量词的更多详细信息，请参阅The regex tutorial。

来源

2012-07-10 13:21:46

非常感谢你，我的答案是你的答案。 – Mejonzhan 2012-07-10 13:37:14

我不推荐使用正则表达式来解析HTML。请考虑JSoup或类似的解决方案

Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); 
Elements images = doc.select("img");

Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp.

来源

2012-07-10 13:38:27 Anton

java：正则表达式

回答

相关问题