2016-11-13 99 views
1

我不能完全肯定它如何词组这个问题,或者使这里标题不言而喻。我使用jsoup来解析网页(http://champion.gg/statistics/),我试图抓住使用此代码从表中的统计数据。解析PHP数据与jsoup

public void connect(String url) { 
    try { 
     Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get(); 
     System.out.println(doc.toString()); 
     Element table = doc.select("table[class=table table-striped]").first(); 
     Element tbody = table.select("tbody").first(); 
     Iterator<Element> rows = tbody.select("tr").iterator(); 
     rows.forEachRemaining(row -> { 
      System.out.println(row.toString()); 
     }); 
    } catch(IOException exception) { 
     if(Settings.DEBUG) { 
      Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception); 
     } 
     Program.alert("Error loading webpage!"); 
    } 
} 

,它是产生这一结果

<tr ng-repeat="champion in filteredChampions = (championData | startsWith:search.title | filter:roleSort | orderBy:[order+sortExpression.sortBy,order+sortExpression.lastSortBy])"> 
<td class="rank">{{indexNumber($index, filteredChampions.length)}}</td> 
<td ng-class="{'selected-column':determineSelected('title')}"> <a href="/champion/{{champion.key}}/{{champion.role}}"> 
    <div class="tsm-tooltip tsm-angular-champion-tt" data-type="champions" data-name="{{champion.key}}" data-id="{{matchupData}}"> 
    <div class="matchup-champion {{champion.key}}"></div> 
    <span class="stat-champ-title">{{champion.title}}</span> 
    </div> </a> </td> 
<td class="stats-role-title" ng-class="{'selected-column':determineSelected('role')}">{{champion.role}}</td> 
<td ng-class="{'selected-column':determineSelected('winPercent')}"> <span ng-class="{'top-half': (champion.general.winPercent >= 50), 'bottom-half': (champion.general.winPercent < 50)}">{{champion.general.winPercent}}%</span> </td> 
<td ng-class="{'selected-column':determineSelected('playPercent')}">{{champion.general.playPercent}}%</td> 
<td ng-class="{'selected-column':determineSelected('banRate')}">{{champion.general.banRate}}%</td> 
<td ng-class="{'selected-column':determineSelected('experience')}">{{champion.general.experience}}</td> 
<td ng-class="{'selected-column':determineSelected('kills')}">{{champion.general.kills}}</td> 
<td ng-class="{'selected-column':determineSelected('deaths')}">{{champion.general.deaths}}</td> 
<td ng-class="{'selected-column':determineSelected('assists')}">{{champion.general.assists}}</td> 
<td ng-class="{'selected-column':determineSelected('largestKillingSpree')}">{{champion.general.largestKillingSpree}}</td> 
<td ng-class="{'selected-column':determineSelected('totalDamageDealtToChampions')}">{{champion.general.totalDamageDealtToChampions}}</td> 
<td ng-class="{'selected-column':determineSelected('totalDamageTaken')}">{{champion.general.totalDamageTaken}}</td> 
<td ng-class="{'selected-column':determineSelected('totalHeal')}">{{champion.general.totalHeal}}</td> 
<td ng-class="{'selected-column':determineSelected('minionsKilled')}">{{champion.general.minionsKilled}}</td> 
<td ng-class="{'selected-column':determineSelected('neutralMinionsKilledEnemyJungle')}">{{champion.general.neutralMinionsKilledEnemyJungle}}</td> 
<td ng-class="{'selected-column':determineSelected('neutralMinionsKilledTeamJungle')}">{{champion.general.neutralMinionsKilledTeamJungle}}</td> 
<td ng-class="{'selected-column':determineSelected('goldEarned')}">{{champion.general.goldEarned}}</td> 
<td ng-class="{'selected-column':determineSelected('overallPosition')}">{{champion.general.overallPosition}}</td> 
<td ng-class="{'selected-column':determineSelected('overallPositionChange')}"><span class="glyphicon" ng-class="{'glyphicon-arrow-up': (champion.general.overallPositionChange > 0), 'glyphicon-arrow-down': (champion.general.overallPositionChange < 0), 'same-position': (champion.general.overallPositionChange === 0)}">{{Math.abs(champion.general.overallPositionChange)}}</span></td> 
</tr> 

现在不是产生结果的平均量杀死特定的冠军已经它会说champion.general.kills的结果,我得到。如何解析页面,以便代替champion.general.kills它会给出一个实际的结果,如8?

+0

它看起来像网站使用角度注入在视图中的统计信息。也许[这个答案](http://stackoverflow.com/questions/14904776/parse-javascript-with-jsoup)可以帮助你。 –

回答

0

当涉及到数据提取出来的网页,你必须去的地方的数据。在这种情况下,数据仍在网页中,这很好。您需要获取包含数据的脚本标记并解析该标记。现在,此示例代码假定它是在指数脚本标签11

public static void main(String[] args) 
{ 
    try 
    { 
     Document doc = Jsoup 
       .connect("http://champion.gg/statistics/") 
       .userAgent(
         "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36") 
       .get(); 
     System.out.println(doc.toString()); 
     Elements table = doc.select("script"); 
     Element script = table.get(11); 
     parseText(script); 
    } 
    catch (IOException exception) 
    { 

    } 
} 

public static void parseText(Element script) 
{ 
    String text = ((DataNode) script.childNode(0)).toString().trim(); 
    int index = text.indexOf("_id"); 
    while (index > 0) 
    { 
     index += 6;// Beginning of value 
     int endQuote = text.indexOf("\"", index); 
     String id = text.substring(index, endQuote); 
     index = text.indexOf("\"key\":\"", endQuote); 
     endQuote = text.indexOf("\"", index + 8); 
     String key = text.substring(index, endQuote); 
     index = text.indexOf("\"kills\":", endQuote); 
     endQuote = text.indexOf(",", index); 
     String kills = text.substring(index, endQuote); 
     text = text.substring(endQuote); 
     index = text.indexOf("_id", index); 
     System.out.println(id + key + kills); 
    } 
} 

输出:

5812965753fa9743395ee93a “关键”: “厄加特” 杀死 “:6.47

5812965753fa9743395ee93b” 重点“: “Aatrox” 杀死 “:5.8

5812965753fa9743395ee93d” 关键 “:” Galio “杀死”:4.58

5812965753fa9743395ee940 “关键”: “Kled” 杀死“:7.3 ...

+0

虽然这对于20位冠军来说是有效的(我诚实地说)并不完全理解你的代码,但是我可以理解选择脚本,但为什么你必须使用* .get(11); *这是干什么的?在此期间,我将尝试自行研究,我也不明白你在使用什么子字符串,不应该有更简单的方法来读取脚本中的数据吗?它看起来像JSON,我希望我可以更容易地阅读数据,因为它看起来像脚本内的对象。非常感谢您的帮助! – Metorrite

+0

.get(11)获取页面上的第十二个脚本标记。之前有11个其他脚本标签。可能有一种更简单的方法,但是我对JSON不太了解,并且我采取了低级别的策略。 – ProgrammersBlock

0

我发现ProgrammersBlock的帮助答案。通过回顾脚本数据,我将它从JSON转换为完整的Java对象!

package com.databot.web.parser; 

import java.io.IOException; 
import java.io.StringReader; 
import java.util.ArrayList; 
import java.util.List; 
import java.util.logging.Level; 

import org.jsoup.Jsoup; 
import org.jsoup.nodes.DataNode; 
import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element; 
import org.jsoup.select.Elements; 

import com.databot.Program; 
import com.databot.Settings; 
import com.databot.champions.ChampionStats; 
import com.databot.champions.Champion; 
import com.google.gson.stream.JsonReader; 

public class WebParser { 

public void connect(String url) { 
    try { 
     Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get(); 
     Elements table = doc.select("script"); 
     Element script = table.get(11); 
     parseText(script); 
    } catch(IOException exception) { 
     if(Settings.DEBUG) { 
      Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception); 
     } 
     Program.alert("Error loading webpage!"); 
    } 
} 

public void parseText(Element script) 
{ 
    String text = ((DataNode) script.childNode(0)).toString().substring(22).trim(); 
    System.out.println(text); 
    List<Champion> champions = new ArrayList<>(); 
    try { 
     JsonReader reader = new JsonReader(new StringReader(text)); 
     reader.setLenient(true); 
     reader.beginArray(); 
     while(reader.hasNext()) { 
      reader.beginObject(); 
       String id = "", key = "", role = "", title = ""; 
       ChampionStats stats = new ChampionStats(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0); 
      while(reader.hasNext()) { 
       String name = reader.nextName(); 
       if(name.equalsIgnoreCase("_id")) { 
        id = reader.nextString(); 
       } else if(name.equalsIgnoreCase("key")) { 
        key = reader.nextString(); 
       } else if(name.equalsIgnoreCase("role")) { 
        role = reader.nextString(); 
       } else if(name.equalsIgnoreCase("title")) { 
        title = reader.nextString(); 
       } else if(name.equalsIgnoreCase("general")) { 
        double winPercent = 0, playPercent = 0, banRate = 0, experience = 0, kills = 0, deaths = 0, assists = 0, totalDamageDealtToChampions = 0, totalDamageTaken = 0, totalHeal = 0, largestKillingSpree = 0, minionsKilled = 0, neutralMinionsKilledTeamJungle = 0, neutralMinionsKilledEnemyJungle = 0, goldEarned = 0; 
        int overallPosition = 0, overallPositionChange = 0; 
         reader.beginObject(); 
         while(reader.hasNext()) { 
          String gName = reader.nextName(); 
          if(gName.equalsIgnoreCase("winPercent")) { 
           winPercent = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("playPercent")) { 
           playPercent = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("banRate")) { 
           banRate = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("experience")) { 
           experience = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("kills")) { 
           kills = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("deaths")) { 
           deaths = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("assists")) { 
           assists = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("totalDamageDealtToChampions")) { 
           totalDamageDealtToChampions = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("totalDamageTaken")) { 
           totalDamageTaken = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("totalHeal")) { 
           totalHeal = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("largestKillingSpree")) { 
           largestKillingSpree = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("minionsKilled")) { 
           minionsKilled = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("neutralMinionsKilledTeamJungle")) { 
           neutralMinionsKilledTeamJungle = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("neutralMinionsKilledEnemyJungle")) { 
           neutralMinionsKilledEnemyJungle = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("goldEarned")) { 
           goldEarned = reader.nextDouble(); 
          } else if(gName.equalsIgnoreCase("overallPosition")) { 
           overallPosition = reader.nextInt(); 
          } else if(gName.equalsIgnoreCase("overallPositionChange")) { 
           overallPositionChange = reader.nextInt(); 
          } else { 
           reader.skipValue(); 
          } 
         } 
         reader.endObject(); 
         stats = new ChampionStats(winPercent, playPercent, banRate, experience, kills, deaths, assists, totalDamageDealtToChampions, totalDamageTaken, totalHeal, largestKillingSpree, minionsKilled, neutralMinionsKilledTeamJungle, neutralMinionsKilledEnemyJungle, goldEarned, overallPosition, overallPositionChange); 
       } else { 
        reader.skipValue(); 
       } 
      } 
      reader.endObject(); 
      champions.add(new Champion(id, key, role, title, stats)); 
     } 
     reader.endArray(); 
     reader.close(); 
    } catch (Exception e) { 
     Program.alert("Error reading JSON data!"); 
     e.printStackTrace(); 
    } 
    champions.forEach(champion -> { 
     System.out.println(champion.toString()); 
    }); 
} 
} 

这是我的全WebParser类,如果有人有兴趣,我确定有一个更好的方法或写这更有效的方式,但是这是为我工作,截至目前!