我只需要匹配第一次出现的html链接与'data- {someData}'属性。我写的正则表达式如下图所示:正则表达式首次出现html链接
\<a\s+(.+)\s+data-\s*(.+)\s*>(.+)<\/a>
和它的作品对HTML的PICE与像只有一个HTML链接:
SOME TEXT/HTML
<a href="~/link.aspx?_id=B0B5056BD5984878BEB5C92AF6B74DB3&_z=z"
data-dms="{6782B150-F6FA-49E6-A2FF-6D6014470373}"
data-targetid="{B0B5056B-D598-4878-BEB5-C92AF6B74DB3}"
data-dms-event="Content button">Link1
</a>
SOME TEXT/HTML
,但问题是当HTML中包含更多的联系。然后正则表达式匹配,直到最后一次出现</a>
。所以,从下面的HTML:
SOME TEXT/HTML
<a href="~/link.aspx?_id=B0B5056BD5984878BEB5C92AF6B74DB3&_z=z"
data-dms="{6782B150-F6FA-49E6-A2FF-6D6014470373}"
data-targetid="{B0B5056B-D598-4878-BEB5-C92AF6B74DB3}"
data-dms-event="Content button">Link1
</a>
SOME TEXT/HTML
<a href="~/link.aspx?_id=1256272320C4429DAB8A1F40D429C841&_z=z"
data-dms="{6782B150-F6FA-49E6-A2FF-6D6014470373}"
data-targetid="{12562723-20C4-429D-AB8A-1F40D429C841}"
data-dms-event="Content button">Link2
</a>
SOME TEXT/HTML
我需要修复我的正则表达式来只匹配:
<a href="~/link.aspx?_id=B0B5056BD5984878BEB5C92AF6B74DB3&_z=z"
data-dms="{6782B150-F6FA-49E6-A2FF-6D6014470373}"
data-targetid="{B0B5056B-D598-4878-BEB5-C92AF6B74DB3}"
data-dms-event="Content button">Link1
</a>
为什么你不使用DOM解析器来解析HTML? –