我一直在玩弄卷曲和XPath对一些webscraping。我终于得到我的代码运行,但在尝试另一边后停止。我唯一改变的是路径和网址。我是全新的,并且一直在为此工作一周。因此,如果这是一个明显的失败,请耐心等待。Xpath查询将行不通
我的代码是:
<?php
/*----Connection to Database----*/
include('wp-config.php');
mysql_connect(DB_HOST, DB_USER, DB_PASSWORD);
mysql_select_db("db");
/*----US Dollar Index----*/
$url = "http://www.wsj.com/mdc/public/page/2_3023-fut_index-futures.html";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
// Make the cURL request
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
\t echo "<br />cURL error number:" .curl_errno($ch);
\t echo "<br />cURL error:" . curl_error($ch);
\t exit;
}
// Parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);
// Grab all the MONTH on the page
$xpath = new DOMXPath($dom);
$data = $xpath->query("/html/body/div[6]/div[3]/div/table[9]/tbody/tr[position() >= 3 and position() <=6]");
//[position() >= 1 and position() <=13]
// Searching for data
$values = array();
foreach($data as $row) {
\t $values[] = $row->nodeValue;
}
print_r($values);
?>
</body>
</html>
乳清你说停止,这是否意味着超时的脚本,返回任何内容,有错误....等等? – Rasclatt
抱歉没有提供该信息。该脚本没有超时或返回错误。唯一显示的是“Array()” –
你是什么意思你“改变路径和网址”?你为什么改变它?你拥有的xpath只对你的代码中的url有效...... – drkthng