我知道获取http响应代码非常容易,就像我们可以使用get_headers()函数一样,或者我们也可以使用cURL,但是我有3百万个url。快速获取Url的http响应代码
所以请告诉我如何快速获取每个网址的状态。
因为在目前的情况下,每个网址约需1秒,那么您可以计算完成需要多少时间。
error_reporting(E_ALL & ~E_NOTICE & ~E_WARNING);
$row = 1;
if (($handle = fopen("2.6MMURL-10-14.csv", "r")) !== FALSE) {
$i = 1;
//echo '<pre>';
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
//print_r($data[0]);
$result = getStatus($data[0]);
echo $result."<br />";
// print_r($result);
if($i == 16)
{
echo $i;
//appendToCsv($result);
//exit;
}
$i++;
}
fclose($handle);
}
exit;
function getStatus($fileSource) {
//$fileSource = "www.google.com";
$time = date("Y-m-d H:i:s");
$ch = curl_init($fileSource);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_NOBODY, true);
$last_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
$response = curl_exec($ch);
//print_r($response); exit;
preg_match_all('/^Location:(.*)$/mi', $response, $matches);
$retcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
//!empty($matches[1]) ? $redirect = trim($matches[1][0]) : $redirect = 'No redirect found';
$redirect = trim($matches[1][0]);
$time.= date("Y-m-d H:i:s");
$array = "$fileSource, $retcode, $trim($matches[1][0], $time";
return $array;
//print_r($last_url); exit;
//curl_close($ch);
}
function appendToCsv($data) {
header('Content-Type: application/excel');
header('Content-Disposition: attachment; filename="sample.csv"');
$fp = fopen('php://output', 'w');
foreach ($data as $line) {
$val = explode(",", $line);
fputcsv($fp, $val);
}
fclose($fp);
}
什么码你尝试过这么远吗? – 2014-10-28 21:51:21
开始并行运行请求。除非您在300波特率的拨号线路上进行此操作,否则几乎无法加速网络组件。所以要么并行运行更多的请求(简单),要么找到某种方法来减少每个请求时间(难以/不可能)。 – 2014-10-28 21:52:57
使用wget,像这样 - http://unix.stackexchange.com/questions/61132/how-do-i-use-wget-with-a-list-of-urls-and-their-corresponding-output-文件 – Cheery 2014-10-28 21:55:17