file_get_contents（）将UTF-8转换为ISO-8859-1

我试图从yahoo.com获取搜索结果。file_get_contents（）将UTF-8转换为ISO-8859-1

但是 file_get_contents（）将UTF-8字符集（yahoo使用的charset）内容转换为ISO-8859-1。

尝试：

$filename = "http://search.yahoo.com/search;_ylt=A0oG7lpgGp9NTSYAiQBXNyoA?p=naj%C5%A1%C5%A5astnej%C5%A1%C3%AD&fr2=sb-top&fr=yfp-t-701&type_param=&rd=pref"; 

echo file_get_contents($filename);

脚本作为

header('Content-Type: text/html; charset=UTF-8');

或

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

或

$er = mb_convert_encoding($filename , 'UTF-8');

或

$s2 = iconv("ISO-8859-1","UTF-8",$filename);

或

echo utf8_encode(file_get_contents($filename));

没有帮助，因为让网页内容speciall字符式T z是用问号代替后???

我将不胜感激任何形式的帮助。

来源

2011-04-08 vladinko0

** file_get_contents（）不会转换任何内容** – 2011-04-09 11:35:32

这似乎是content negotiation问题，因为file_get_contents可能会发送一个请求，只接受ISO 8859-1作为字符编码。

$opts = array('http' => array('header' => 'Accept-Charset: UTF-8, *;q=0')); 
$context = stream_context_create($opts); 

$filename = "http://search.yahoo.com/search;_ylt=A0oG7lpgGp9NTSYAiQBXNyoA?p=naj%C5%A1%C5%A5astnej%C5%A1%C3%AD&fr2=sb-top&fr=yfp-t-701&type_param=&rd=pref"; 
echo file_get_contents($filename, false, $context);

来源

2011-04-09 11:37:38 Gumbo

是的，这工作！非常感谢你！！！ :) – vladinko0 2011-04-09 11:41:56

有趣的事情，我试过'Accept-Charset = utf-8; q = 0.7，*; q = 0.7'，但不起作用:) – 2011-04-09 11:57:28

@webarto：值'utf-8; q = 0.7， *; q = 0.7'就像'utf-8，*'并且可以接受任何相同的字符编码。 – Gumbo 2011-04-09 12:09:02

$s2 = iconv("ISO-8859-1","UTF-8//TRANSLIT//IGNORE",$filename);

更好的解决方案...

function curl($url){ 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
    curl_setopt($ch, CURLOPT_ENCODING, 1); 
    return curl_exec($ch); 
    curl_close($ch); 
} 

echo curl($filename);

来源

2011-04-08 20:21:44

结果是：文档已移至此处。 – vladinko0 2011-04-09 10:48:41

@ vladinko0，我想你需要设置'CURLOPT_FOLLOWLOCATION'，我已经更新了我的答案，再试一次。 – 2011-04-09 11:17:22

现在它加载页面，但与file_get_contents（）具有相同的结果，这意味着带有问号。字符集也转换为ISO-8859-1。 – vladinko0 2011-04-09 11:32:07

的file_get_contents应该不变化的字符集。数据以二进制字符串形式提取。

当检查出您所提供的，这是它提供了头：

Content-Type: text/html; charset=ISO-8859-1

此外，在机身：

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

而且，你不能转换UTF-8无损转换为ISO-8859-1并返回到UTF-8时返回字符。 UTF-8/unicode支持很多更多的字符，所以在第一步中字符会丢失。

在浏览器中，情况并非如此，所以也许您只需要提供一个正确的Accept-Encoding标头来指示雅虎的系统可以接受UTF-8。

来源

2011-04-08 20:46:40 Evert

你是如何找出'Content-Type：text/html; charset = ISO-8859-1'和'' 当我查看该页面的源代码时请参阅<！doctype html>' – vladinko0 2011-04-09 10:59:14

它根据您的位置，你可以尝试使用俄罗斯代理服务器获取页面。 – 2011-04-09 11:56:23

对于任何调查这个：

我的编码问题花费的时间教我

可以使用stream_context_create明确指出您接受UTF-8创建自定义stream context为file_get_contents很少有PHP函数“神奇地”改变字符串的编码。（其中一个罕见的例子是：

exec($command, $output, $returnVal)

也请注意，工作头设置如下：

header('Content-Type: text/html; charset=utf-8');

，而不是：

header('Content-Type: text/html; charset=UTF-8');

因为我也有类似的问题，因为一个你描述，它足以正确设置标题。

希望这有助于！

来源

2015-06-18 12:21:28 Stavros

file_get_contents（）将UTF-8转换为ISO-8859-1

回答

相关问题