2016-02-25 78 views
1

我有我想要提取一些信息,这个PHP代码,但我停下来HREF步:如何让只有一些href属性

$site = "http://www.sports-reference.com/olympics/countries"; 
$site_html = file_get_html($site); 

$country_dirty = $site_html->getElementById('div_countries'); 

     foreach($country_dirty->find('img') as $link){ 

      $country = $link->alt; 
      $link_country = "$site/$country"; 
      $link_country_html = file_get_html($link_country); 

      $link_season = $link_country_html->getElementById('div_medals'); 

       foreach($link_season->find('a') as $season){ 


        echo $link_year_season = $season->href . "\n"; 

        //echo $link_season = strstr ($link_year_season,'summer') . "\n"; 

       } 
      } 

变量$ link_year_season让我以下的输出:

/olympics/countries/AFG/summer/2012/ 
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html 
/olympics/athletes/ni/rohullah-nikpai-1.html 
/olympics/countries/AFG/summer/2008/ 
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html 
/olympics/athletes/ni/rohullah-nikpai-1.html 
/olympics/countries/AFG/summer/2004/ 
/olympics/countries/AFG/summer/1996/ 
/olympics/countries/AFG/summer/1988/ 
/olympics/countries/AFG/summer/1980/ 
/olympics/countries/AFG/summer/1972/ 
..... 

我想知道是否有可能获得仅此输出:

/olympics/countries/AFG/summer/2012/ 
/olympics/countries/AFG/summer/2008/ 
/olympics/countries/AFG/summer/2004/ 
/olympics/countries/AFG/summer/1996/ 
/olympics/countries/AFG/summer/1988/ 
/olympics/countries/AFG/summer/1980/ 
/olympics/countries/AFG/summer/1972/ 
+0

这样做的一个快速方法是在输出中应用'preg_match'或'strpos'或类似的东西,您已经得到了。 – Maximus2012

+0

下面的答案是否可以解决您的问题? http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work – chris85

回答

0

你应该是AB le使用此正则表达式检查链接是否以/olympics/countries/AFG/summer/开头,然后是数字和/

foreach($link_season->find('a') as $season){ 
    if(preg_match('~^/olympics/countries/AFG/summer/\d+/~', $season->href)) { 
     echo $link_year_season = $season->href . "\n"; 
     //echo $link_season = strstr ($link_year_season,'summer') . "\n"; 
    } 
} 

演示:https://regex101.com/r/bZ1vP3/1

您还可以通过捕获夏天后的数字拉本年度(假设为一年,第一正则表达式只检查数量这一个是严格)..

foreach($link_season->find('a') as $season){ 
     if(preg_match('~^/olympics/countries/AFG/summer/(\d{4})/~', $season->href, $year)) { 
      echo $link_year_season = $season->href . "\n"; 
      //echo $link_season = strstr ($link_year_season,'summer') . "\n"; 
      echo 'The year is ' . $year[1] . "\n"; 
     } 
} 

如果季节也可以变化,你可以做(?:summer|winter)这将允许summerwinter成为第四个目录。

+0

如果你想允许任何国家和任何季节,你可以做'^ \/olympics \/countries \/[AZ] + \ /(?:summer | winter)\/\ d {4} \ /',假设夏季和冬季是奥运会发生的唯一季节;) – shamsup