我想学习网页抓取我选择https://www.betfair.com作为一个例子,我已经成功获取了很多页面的数据,但是当我要去访问https://www.betfair.com/sport/horse-racing我没有得到但是,如果我从浏览器中查看页面源并向我显示数据,那么它不会出现内容是由JavaScript或类似内容生成的问题。 这里是我的代码:curl没有显示正确的来源,通过浏览器查看页面源查看
$url ='https://www.betfair.com/sport/horse-racing';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$page = curl_exec($ch);
curl_close($ch);
echo $page;
如果您可以通过浏览器查看源代码时,看看你能找到这样的:
<a href="/sport/horse-racing?action=loadRacingSpecials&tab=SPECIALS& modules=multipick-horse-racing" class="ui-nav link ui-clickselect ui-ga- click" data-dimension3="sports-header" data-dimension4="Specials" data-dimension5="Horse Racing" data-gacategory="Interface" data-gaaction="Clicked Horse Racing Header" data-galabel="Specials"
data-loader=".multipick-content-container > div, .antepost-content- container > div, .future-racing-content-container > div, .bet-finder-content- container > div, .racing-specials-content-container > div, .future-racing- market-content-container > div"
>
Specials</a>
但卷曲没有得到这些元素。
它是在$页面结果保存到一个文件,你会看到结果http://prntscr.com/edcdny – Faxsy
@Faxsy当我赞同这是我的本地网页,看看源它不存在你能告诉我它的表现吗? – Codester