2010-08-12 181 views
0

fff.html是它的一些与电子邮件地址的电子邮件有HREF mailto链接和一些不这样做,我想他们刮他们,并输出到以下格式刮电子邮件地址

[email protected],[email protected],[email protected] 

我有一个简单的刮刀来获取在HREF链接,但东西的人是奇怪的

<?php 
    $url = "fff.html"; 
    $raw = file_get_contents($url); 

    $newlines = array("\t","\n","\r","\x20\x20","\0","\x0B"); 
    $content = str_replace($newlines, "", html_entity_decode($raw)); 

    $start = strpos($content,'<a href="mailto:'); 
    $end = strpos($content,'"',$start) + 8; 
    $mail = substr($content,$start,$end-$start); 

    print "$mail<br />"; 
    ?> 

我应该得到加分项目原来使用Lorem存有的

回答

3

的问题是,如果你有什么M而不是HTML页面中的一个电子邮件地址。 substr只会返回第一个实例。这是一个将解析所有电子邮件地址的脚本。您可能需要调整它以供您使用。它会以您要求的CSV格式输出结果。

<?php 
$url = "fff.html"; 
$raw = file_get_contents($url); 

$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B"); 
$content = str_replace($newlines, "", html_entity_decode($raw)); 

$start = strpos($content, '<body>'); 
$end = strpos($content, '</body>'); 
$data = substr($content, $start, $end-$start); 

$pattern = '#a[^>]+href="mailto:([^"]+)"[^>]*?>#is'; 
preg_match_all($pattern, $data, $matches); 

foreach ($matches[1] as $key => $email) { 
    $emails[] = $email; 
} 
echo implode(', ', $emails); 
?>