的preg_match问题

我试图抓住数值（即105），请检查我的html代码如下...的preg_match问题

<p> 
       External Backlinks 
      </p> 
      <p style="font-size: 150%;"> 
       <b>105</b> 
      </p>

，我已经使用正则表达式如下...

$url = 'http://www.example.com/test.html'; 

preg_match('#<p>External Backlinks</p><p style="font-size: 150%;"><b>([0-9\.]+)#', file_get_contents($url), $matches); 

echo $matches[1];

但它没有返回正确的值，请帮助修复上述正则表达式。谢谢。

来源

2012-02-21 seoppc

http://stackoverflow.com/a/1732454/1163867 – MarcinJuraszek 2012-02-21 21:32:39

对于HTML，请勿使用* regex *，使用* xpath * 。 Xpath是HTML/XML的“常规”表达式，例如'''p [@ style =“font-size：150％;”]/b'。 – hakre 2012-02-21 22:04:35

我不推荐使用正则表达式来解析HTML。改为使用DOM parser。 Read this rant for more information about why :)

回答你的问题。下面是你的榜样工作正则表达式：

<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>

这是丑陋的，但它的作品...... 不要使用它。

preg_match('#<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>#', file_get_contents($url), $matches); 

echo $matches[1];

输出：

与您正则表达式的问题是，它没有考虑在HTML源代码的空格，而且你也没有逃脱你的斜杠。

如果源看起来是这样的：

<p>External Backlinks</p><p style="font-size: 150%;"><b>105</b></p>

此致会工作，但不是非常稳健。（我想可以使用正则表达式来解析HTML从来没有非常强大。）

来源

2012-02-21 21:45:56 ohaal

回答

相关问题