2013-02-10 42 views
-1

下面的脚本将抓取来自特定URL的链接的完整列表以供我的dom scraper使用。但是有些列表可能会达到1000年,所以我希望能够手动设置实际抓取的链接。就像我输入从链接50开始并在列表中链接100结束一样。我会怎么做?如何定位阵列的定义范围

<form action="" method="POST"> 
    <label>Url to scrape: </label> 
    <input type="text" name="url_scrape" id="url-scrape" /> 
    <input type="submit" value=" Scrape now " /> 
    <br /> 

    <input type="hidden" name="scrape" value="yes" /> 
    <br /> 

</form> 
<br /> 
<br /> 
<?php 

if($_POST['scrape'] != 'yes') 
    return; 



include('simple_html_dom.php'); 

function strim($input){ 

    $st = explode('$', $input); 

    return (float)str_replace(array(' ',','),array('',''), $st[1]); 

} 

$url_scrape = $_POST['url_scrape']; 

if($url_scrape == '') 
    return; 



$BrowsebyLetter = file_get_html($url_scrape); 

$links = $BrowsebyLetter->find('.Results a'); 


?> 
<h1 id="patient">Please be patient while scraping data</h1> 

<div id="scrape-progress"> 
    <div id="scrape-progress-ctx">0%</div> 
</div> 
<br /> 
<br /> 
<br /> 
<div id="progress-txt"></div> 
<br /> 
<br /> 
<button id="retry" onclick="iframe.src = ">Retry if not continue</button> 
<iframe src="" id="cacheLoad"></iframe> 
<script type="text/javascript"> 
    var total = <?php echo count($links); ?>; 
    var ctx = document.getElementById('scrape-progress-ctx'); 
    var iframe = document.getElementById('cacheLoad'); 
    var prx = document.getElementById('progress-txt'); 
    var pt = document.getElementById('patient'); 
    var retry = document.getElementById('retry'); 
    var currentLink = ''; 
    var links = [ 
<?php 
    foreach($links as $link){ 

     echo "'".$link->href."',"; 

    } 
?>'Complete scrape <?php echo count($links); ?> links' ]; 
    function progress(cur){ 
     ctx.style.width = Math.ceil((cur/total)*100)+'%'; 
     ctx.innerHTML = Math.ceil((cur/total)*100)+'%'; 
    }; 
    function exe(i){ 
     progress(i); 
     if(links[i] != 'Complete <?php echo count($links); ?> links') 
     { 
      currentLink = window.location+'&target='+links[i].split('Job=')[1]+'&cou='+(i+1); 
      iframe.src = currentLink; 
     }; 
     if(i==total){ 
      pt.innerHTML = 'Successful'; 
      pt.style.color = 'green'; 
      retry.style.display = 'none'; 
      alert('Scrape process is complete'); 
     }; 
     prx.innerHTML = '<strong>Status: </strong>'+ links[i]; 
    }; 
    exe(0); 
</script> 

回答

0

尝试增加这在以往任何时候你的循环是:

var links = [ 
<?php 
    $a = 0; 
    foreach($links as $link){ 
     $a++; 
     if(($a > 50) && ($a < 100)){ 
      echo "'".$link->href."',"; 
     } 
    } 
?> 

它会检查,如果该链接是50和100之间,如果是,它会打印出来。希望我帮助:)