2012-05-25 32 views
2

我将网页的源文件复制到文本文档中,并且无法从文件中获取两个数据点;经纬度。如何从文本文档中获取2个数据点?

php文件我必须做出和扫描文档是这样的:

<?php 

$ch = curl_init("http://www.marinetraffic.com/ais/shipdetails.aspx?MMSI=258245000"); 
$fp = fopen("example_homepage.txt", "w"); 

curl_setopt($ch, CURLOPT_FILE, $fp); 
curl_setopt($ch, CURLOPT_HEADER, 0); 

curl_exec($ch); 
curl_close($ch); 
fclose($fp); 

header('Content-Type: text/plain'); 

$myFile = "example_homepage.txt"; 
$fh = fopen($myFile, 'r'); 
$theData = fread($fh, 9251); 
fclose($fh); 
echo $theData; 

?> 

的GPS是埋在文字,看起来像这样(从文件example_homepage.txt):

<img style="border: 1px solid #aaa" src="flags/NO.gif" /> 
<br/> 
<b>Call Sign:</b>LAJW 
<br/> 
<b>IMO:</b>9386380, 
<b>MMSI:</b>258245000 
<br/> 
<hr/> 
<h2>Last Position Received</h2> 
<b>Area:</b>North Sea 
<br/> 
<b>Latitude/Longitude:</b> 
<a href='default.aspx?mmsi=258245000&centerx=5.311533&centery=60.39997&zoom=10&type_color=9'>60.39997˚/5.311533˚ (Map)</a> 
<br/> 
<b>Currently in Port:</b> 
<a href='default.aspx?centerx=5.32245&centery=60.39085&zoom=14'>BERGEN</a> 
<br/> 
<b>Last Known Port:</b> 
</b> 
<a href='default.aspx?centerx=5.32245&centery=60.39085&zoom=14'>BERGEN</a> 
<br/> 
<b>Info Received:</b>0d 0h 20min ago 
<br/> 
<table> 
    <tr> 
     <td>&nbsp; 
      <img src="shipicons/magenta0.png" /> 
     </td> 
     <td> 
      <a href='default.aspx?mmsi=258245000&centerx=5.311533&centery=60.39997&zoom=10&type_color=9'><b>Current Vessel's Track</b></a> 
     </td> 
    </tr> 
    <tr> 
     <td> 
      <img src="windicons/w05_330.png" /> 
     </td> 
     <td> 
      <b>Wind:</b>5 knots, 327&deg;, 13&deg;C</td> 
    </tr> 
</table> 
<a href='datasheet.aspx?datasource=ITINERARIES&MMSI=258245000'><b>Itineraries History</b></a> 
<br/> 
<hr/> 
<h2>Voyage Related Info (Last Received)</h2> 
<b>Draught:</b>6.8 m 
<br/> 
<b>Destination:</b>BERGEN HAVN 
<br/> 
<b>ETA:</b>2012-05-22 18:00 
<br/> 
<b>Info Received:</b>2012-05-23 18:43 (

这两个号码我希望是:

纬度:60.39085 经度:5.32245

我对这种事情并不那么有经验。也许有更好的方法。请告诉我。

编辑:与最后三行代码的FYI,我能够得到文本文件中的第一个9251字符。

+0

可能重复【如何分析和处理PHP程序HTML?](HTTP:// stacko verflow.com/questions/3577641/how-to-parse-and-process-html-with-php) – derekerdmann

回答

0

这是我做过什么让我想要的结果:(打印出* -70.19347 42.02112 *

<?php 
//goes though and copies the web page to a text file 
$ch = curl_init("http://photos.marinetraffic.com/ais/lightdetails.aspx?light_id=1000019773"); 
$fp = fopen("example_homepage.txt", "w"); 
curl_setopt($ch, CURLOPT_FILE, $fp); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_exec($ch); 
curl_close($ch); 
fclose($fp); 

//prevents some parsing of the html document 
header('Content-Type: text/plain'); 

//opens text file and reads contents to a string 
$myFile = "example_homepage.txt"; 
$fh = fopen($myFile, 'r'); 
$theData = fread($fh,12000); 
fclose($fh); 

//finds the location of the beginning of the GPS data 
$pos = strrpos($theData, "&centerx="); 
if ($pos === false) { 
    // note: three equal signs 
    echo "not found"; 
} 

//cuts out that string and finds position for x and y components 
$subtract = 12000-$pos-36; 
$rest = substr($theData, $pos, -$subtract); 
$lat = substr($rest, 9, -17); 
$lonpos = strrpos($rest, "&centery=")+9; 
$lon = substr($rest, $lonpos); 

//turns the values into floats 
$lat = floatval($lat); 
$lon = floatval($lon); 

//echo $rest; 
echo $lat; 
echo " "; 
echo $lon; 

?> 

希望这可以帮助别人

0

这可能是矫枉过正,但你可以尝试PHP DOM + parse_url + parse_str

$text = file_get_contents('http://example.com/path/to/file.html'); 
$doc = new DOMDocument('1.0'); 
$doc->loadHTML($text); 
foreach($doc->getElementsByTagName('div') AS $div) { 
    $class = $div->getAttribute('class'); 
    if(strpos($class, 'news') !== FALSE) { 
     if($div->hasAttribute('src') OR $div->hasAttribute('href')) { 
      $parsed_url = parse_url($div->getAttribute('src'))); 
      $query_values = parse_str($parsed_url); 
      $desired_values = array(
       $query_values['centerx'], 
       $query__values['centery'] 
      ); 
     } 
    } 
} 
+0

嗯,我很难得到这个工作。我有它托管在这里:http://thoughtfi.com/search_textdoc.php也许我没有正确执行它? – Stagleton

+0

您抓取的HTML格式是否正确? DOM解析器可能很难接受格式不正确的代码。 –

+0

或者你的file_get_contents被限制访问http协议数据(我注意到自从我开始写这个评论以来,页面一直在加载)。 –