2016-12-02 114 views
-1

我需要网页中的一些值,因此我使用html敏捷包构建了一个刮取。如何使用Html敏捷包从网页中刮取值

我会告诉你html网站和我的Csharp。

html网页:

<div class="box-overflow"> 
    <div class="box-overflow__in"> 
     <table class="table-main js-tablebanner-t js-tablebanner-ntb"> 
     <tr> 
      <th class="h-text-left" colspan="2">17. Round</th> 

      <th class="h-text-center">1</th> 

      <th class="h-text-center">X</th> 

      <th class="h-text-center">2</th> 

      <th>&nbsp;</th> 
     </tr> 

     <tr> 
      <td class="h-text-left"><a href= 
      "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/" class= 
      "in-match"><span>Lechia Gdansk</span> - <span>Leczna</span></a></td> 

      <td class="h-text-center"><a href= 
      "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/">3:0</a></td> 

      <td class="table-matches__odds colored"></td> 

      <td class="table-matches__odds" data-odd="4.04"></td> 

      <td class="table-matches__odds" data-odd="6.29"></td> 

      <td class="h-text-right h-text-no-wrap">28.11.2016</td> 
     </tr> 

     <tr> 
      <td class="h-text-left"><a href= 
      "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/" class= 
      "in-match"><span>Plock</span> - <span>Piast Gliwice</span></a></td> 

      <td class="h-text-center"><a href= 
      "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/">0:0</a></td> 

      <td class="table-matches__odds" data-odd="2.05"></td> 

      <td class="table-matches__odds colored"></td> 

      <td class="table-matches__odds" data-odd="3.50"></td> 

      <td class="h-text-right h-text-no-wrap">27.11.2016</td> 
     </tr> 

     <tr> 
      <td class="h-text-left"><a href= 
      "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/" class= 
      "in-match"><span>Slask Wroclaw</span> - <span>Legia</span></a></td> 

      <td class="h-text-center"><a href= 
      "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/">0:4</a></td> 

      <td class="table-matches__odds" data-odd="4.53"></td> 

      <td class="table-matches__odds" data-odd="3.64"></td> 

      <td class="table-matches__odds colored"></td> 

      <td class="h-text-right h-text-no-wrap">27.11.2016</td> 
     </tr> 
     </table> 
    </div> 
    </div> 

我CSHARP:

var url = "http://www.betexplorer.com/soccer/poland/ekstraklasa/results/"; 

     var web = new HtmlWeb(); 
     var doc = web.Load(url); 

     Bets = new List<Bet>(); 



     // Lettura delle righe 
     var Rows = doc.DocumentNode.SelectNodes("//table"); 

     foreach (var row in Rows) 
     { 
      if (!row.GetAttributeValue("class", "").Contains("table-main js-tablebanner-t js-tablebanner-ntb")) 
      { 
       if (string.IsNullOrEmpty(row.InnerText)) 
        continue; 

       var rowBet = new Bet(); 
       foreach (var node in row.ChildNodes) 
       { 
        var data_odd = node.GetAttributeValue("data-odd", ""); 

        if (string.IsNullOrEmpty(data_odd)) 
        { 
         if (node.GetAttributeValue("class", "").Contains("in-match")) 
         { 
          rowBet.Match = node.InnerText.Trim(); 
          var matchTeam = rowBet.Match.Split(new[] { " - " }, StringSplitOptions.RemoveEmptyEntries); 
          rowBet.Home = matchTeam[0]; 
          rowBet.Host = matchTeam[1]; 
         } 


         if (node.GetAttributeValue("class", "").Contains("h-text-center")) 
         { 
          rowBet.Result = node.InnerText.Trim(); 
          var matchPoints = rowBet.Result.Split(new[] { ':' }, StringSplitOptions.RemoveEmptyEntries); 
          int help; 
          if (int.TryParse(matchPoints[0], out help)) 
          { 
           rowBet.HomePoints = help; 
          } 
          if (matchPoints.Length == 2 && int.TryParse(matchPoints[1], out help)) 
          { 
           rowBet.HostPoints = help; 
          } 

         } 


         if (node.GetAttributeValue("class", "").Contains("h-text-right h-text-no-wrap")) 
          rowBet.Date = node.InnerText.Trim(); 

        } 
        else 
        { 
         rowBet.Odds.Add(data_odd); 
        } 
       } 

       if (!string.IsNullOrEmpty(rowBet.Match)) 
        Bets.Add(rowBet); 
      } 
     } 

我会给你更多的信息:

I need to take teams name (e.g. Lechia Gdansk - Leczna), 
result (e.g. 3:0) 
data-odd (e.g. 1.49, 4.04, 6.29) 
and match date (e.g. 28.11.2016) 

如果有人需要更多的infromations,问我你想要什么知道。由于

+0

'如果(!row.GetAttributeValue( “类”, “”)。载有( “表主JS-tablebanner-T JS-tablebanner-NTB”) )' - 这些类是在表本身声明的,而不是行。 – stuartd

回答

1

我会不喜欢它

var list = doc.DocumentNode.SelectSingleNode("//table[@class='table-main js-tablebanner-t js-tablebanner-ntb']") 
       .Descendants("tr") 
       .Select(x => new 
       { 
        Val1 = x.SelectSingleNode("td[@class='h-text-left']")?.InnerText, 
        Val2 = x.SelectSingleNode("td[@class='h-text-center']")?.InnerText 
       }) 
       .Where(x => x.Val1!=null) 
       .ToList();