2014-07-22 18 views
1

我正在尝试从网页中提取一些数据。但问题是,而不是拉说:编码字符时出现CURL错误

64 × 191 × 75 cm 

它显示回声作为

64 × 191 × 75 cm 

我的代码:

<?php 

$url = "http://www.google.co.uk" 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Googlebot/2.1;  +http://www.google.com/bot.html)"); 
curl_setopt($ch, CURLOPT_ENCODING ,""); 

$html = curl_exec($ch); 
$dom = new DOMDocument(); 
@$dom->loadHTML($html); 
$xpath = new DOMXPath($dom); 
$q_Dimensions = "//tr/td[@class='FieldTitle'][contains(.,'Dimensions of packed product (W×H×D):')]/following-sibling::td/text()"; 
$dimentionsQ = $xpath->query($q_Dimensions); 
$dimentions = $dimentionsQ->item(0)->nodeValue; 
echo $dimentions; 
exit(); 

我相信这可能是某种问题与性格编码但无法进一步。任何帮助深表感谢。

回答

0

另外,设置charsetUTF-8header()工作也未尝不可:

// add this on the top of your php script 
header('Content-Type: text/html; charset=utf-8'); 

$url = "google.co.uk"; 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Googlebot/2.1;  +http://www.google.com/bot.html)"); 
curl_setopt($ch, CURLOPT_ENCODING ,""); 

$html = curl_exec($ch); 
$dom = new DOMDocument(); 
@$dom->loadHTML($html); 
$xpath = new DOMXPath($dom); 
$q_Dimensions = "//tr/td[@class='FieldTitle'][contains(.,'Dimensions of packed product (W×H×D):')]/following-sibling::td/text()"; 
$dimentionsQ = $xpath->query($q_Dimensions); 
$dimentions = $dimentionsQ->item(0)->nodeValue; 
echo $dimentions; // 64 × 191 × 75 cm 
exit(); 
+0

作品完美无瑕...感谢您的帮助和努力@Ghost非常感谢。保存了很多时间 –

+0

@MaharshiRaval肯定的男人,没问题 – Ghost

0

一套用于CURLOPT_ENCODING另一个卷曲选项并将其设置为“”,以确保它不会返回任何垃圾

curl_setopt($ch, CURLOPT_ENCODING ,""); 
+0

嗨@Anri谢谢你的回复,但正如你可以在上面的代码中看到的,我已经在第8行添加了该选项,但仍然给出了相同的问题。 –