PHP网页刮

我使用PHP网页抓取，我想在周日的价格（3.65）形成以下html代码：PHP网页刮

 <tr class="odd"> 
     <td > 
      <b>Sunday</b> Info 
      <div class="test">test</div> 
     </td> 
     <td> 
     &euro; 3.65 * 

     </td> 
    </tr>

但我不觉得这样做最好的正则表达式。 .. 我用这个PHP代码：

<?php 
     $data = file_get_contents('http://www.test.com/'); 

     preg_match('/<tr class="odd"><td ><b>Sunday</b> Info<div class="test">test<\/div><\/td><td>&euro; (.*) *<\/td><\/tr>/i', $data, $matches); 
     $result = $matches[1]; 
    ?>

但没有结果......什么是错的正则表达式？（我认为这是因为新的行/空格？）

来源

2012-08-06 francisMi

是的，你是对的。 – Napolux 2012-08-06 11:29:05

正则表达式为“€（[0-9。] *）”，而不是为了得到价格。如果是其他人，你可以先分割（）。注意特殊的正则表达式字符，就像价格后面的明显*一样！ – Waygood 2012-08-06 11:55:21

但我也需要使用“星期天”，因为也有其他日子... – francisMi 2012-08-06 11:58:12

问题是标签之间的空白。有一个换行符，制表符和/或空格。

你的正则表达式与它们不匹配。

您还需要为多行设置您的preg_match！

我认为这是更容易使用Xpath刮。

来源

2012-08-06 11:31:29

不要使用正则表达式，HTML不规则。

相反，使用DOM树解析器，如DOMDocument。这documentation可能会帮助你。

/s交换机应该帮助你与你的原始正则表达式，虽然我还没有尝试过。

来源

2012-08-06 11:30:59 Martin

只需添加'$ dom = new DOMDocument（）; $ dom-> loadHTML（$ data）;'？不起作用... – francisMi 2012-08-06 12:54:37

尝试用''替换换行符，然后再次执行正则表达式。

来源

2012-08-06 11:33:36 matteomattei

和其他像\ t \ r – Waygood 2012-08-06 11:57:24

尝试这种方式：

$uri = ('http://www.test.com/'); 
$get = file_get_contents($uri); 

$pos1 = strpos($get, "<tr class=\"odd\"><td ><b>Sunday</b> Info<div class=\"test\">test</div></td><td>&euro;"); 
$pos2 = strpos($get, "*</td></tr>", $pos1); 
$text = substr($get,$pos1,$pos2-$pos1); 
$text1 = strip_tags($text);

来源

2017-03-23 10:44:51 Stefano

使用PHP DOMDocument对象。我们打算从网页解析HTML DOM数据

$dom = new DOMDocument(); 
    $dom->loadHTML($data); 

    $trs = $dom->getElementsByTagName('tr'); // this gives us all the tr elements on the webpage 

    // loop through all the tr tags 
    foreach($trs as $tr) { 
     // until we get one with the class 'odd' and has a b tag value of SUNDAY 
     if ($tr->getAttribute('class') == 'odd' && $tr->getElementsByTagName('b')->item(0)->nodeValue == 'Sunday') { 
      // now set the price to the node value of the second td tag 
      $price = trim($tr->getElementsByTagName('td')->item(1)->nodeValue); 
      break; 
     } 

    }

而不是使用DOM文档的网页抓取的，这是一个有点乏味，你可以得到你的手SimpleHtmlDomParser，它是开源的。

来源

2017-09-15 04:22:53 eosobande

回答

相关问题