2016-12-07 83 views
0

我想从网址刮一些玩家数据(tr)的行,但是当我运行我的代码时没有任何事情发生。我很积极,我的代码很好,因为它可以与包含表格的其他统计网站一起使用。谁能告诉我为什么没有发生什么事?提前致谢。Python BeautifulSoup不刮这个网址

import urllib 
import urllib.request 
from bs4 import BeautifulSoup 

def make_soup(url): 
thepage = urllib.request.urlopen(url) 
soupdata = BeautifulSoup(thepage, "html.parser") 
return soupdata 

soup = make_soup("https://www.whoscored.com/Regions/252/Tournaments/7/Seasons/6365/Stages/13832/PlayerStatistics/England-Championship-2016-2017") 
for record in soup.findAll('tr'): 
    print(record.text) 

回答

0

本页面利用javascript来获取数据,你可以在这个环节发现的原始数据:

https://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics?category=summary&subcategory=all&statsAccumulationType=0&isCurrent=true&playerId=&teamIds=&matchId=&stageId=13832&tournamentOptions=7&sortBy=Rating&sortAscending=&age=&ageComparisonType=&appearances=&appearancesComparisonType=&field=Overall&nationality=&positionOptions=&timeOfTheGameEnd=&timeOfTheGameStart=&isMinApp=true&page=&includeZeroValues=&numberOfPlayersToPick=10 

URL的每个字段可以改变获取你需要的数据。

0

这是因为网站不希望你刮。

Incapsula Protection

我以前selenium发送请求和合照模拟 浏览器,它创造了

它使用Incapsula这是一个安全的服务(他们甚至有一些information约刮了他们的网站)- 检查出来,它很有趣 -

  • This可能会有所帮助
1

简短的回答:你正在寻找的球员数据该URL。

那么你可能要问为什么我已经在那个页面看到他们,他们怎么不在那里?

因此,我将尝试解释当您使用Chrome等现代浏览器浏览该网址时会发生什么情况。

您:输入网址并按回车。

Chrome: Gotcha。我会尽快为你提供该页面,只需一秒钟。 (从该网址获取内容),现在我拥有它了!但等待让我 阅读/解析它之前,我告诉你,(阅读什么里面 的内容),哦废话这个JavaScript告诉我从另一个URL获得额外的 信息,好吧,我会做到这一点;哦,等待这里的另一个 一个告诉我在标题中加载一个广告,以及我不喜欢它,但我只是要做我所告诉的;只需一秒钟,这些css告诉我以 显示玩家的名字,用粗体显示,还行不错;哦,这里的另一张照片从 网址xxx我需要加载,没问题...哦,男人,有多少东西 我要处理?我对这个网站并不满意......(正在致力于一堆其他的东西......)最后一切都准备好了!现在检查出来!

你:玩家xxx其实很不错,我会检查一下。 (点击玩家XXX)

铬::......

正如你可以看到每一个时间,当你浏览网页时,浏览器做大量的“幕后”的东西为用户显示它。所以基本上:网址输入>> url提取的内容>>解析的内容>>提取的其他内容>>提供的所有东西>>显示的页面(一个或多个步骤可能同时完成)

并且随着您的代码,它只是“从url中获取的内容”,也是那些你想要的数据恰好是“额外的内容”,必须从其他地方加载,所以这就是为什么你什么都没有。

那么我该如何获得这些统计数据呢?一旦您知道负责加载这些统计数据的网址,只需追究他们。我如何找出这些网址?那么你可以随时阅读javascripts ...如果你有足够的耐心......

最简单的方法得到你想要的是分析流量,而该页面正在加载,并找出所有幕后交通。我会推荐fiddler,但您可以使用任何您认为合适的工具。

现在,让我们看看会发生什么,当你加载页面: traffic analytics

有实际上是在数百作出完全渲染页面您访问请求,和所有你需要做的是找出哪一个供稿“实际”或“真实”统计。这个网址甚至包含“StatisticsFeed”在内的内容,是否可以成为其中一个?让我们一起来看看:

https://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics?category=summary&subcategory=all&statsAccumulationType=0&isCurrent=true&playerId=&teamIds=&matchId=&stageId=13832&tournamentOptions=7&sortBy=Rating&sortAscending=&age=&ageComparisonType=&appearances=&appearancesComparisonType=&field=Overall&nationality=&positionOptions=&timeOfTheGameEnd=&timeOfTheGameStart=&isMinApp=true&page=&includeZeroValues=&numberOfPlayersToPick=10

{ 
    "playerTableStats": [{ 
     "name": "Conor Hourihane", 
     "firstName": "Conor", 
     "lastName": "Hourihane", 
     "playerId": 134172, 
     "height": 181, 
     "weight": 62, 
     "age": 25, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-MC-", 
     "positionText": "Midfielder", 
     "playedPositionsShort": "M(C)", 
     "teamId": 142, 
     "teamName": "Barnsley", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "ie", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.8705882352941181, 
     "ranking": 1, 
     "apps": 17, 
     "subOn": 0, 
     "minsPlayed": 1530, 
     "manOfTheMatch": 4, 
     "yellowCard": 5.0, 
     "redCard": 0.0, 
     "goal": 3, 
     "assistTotal": 8, 
     "shotsPerGame": 2.2352941176470589, 
     "aerialWonPerGame": 0.6470588235294118, 
     "passSuccess": 81.370449678800867 
    }, 
    { 
     "name": "Anthony Knockaert", 
     "firstName": "Anthony", 
     "lastName": "Knockaert", 
     "playerId": 86794, 
     "height": 172, 
     "weight": 69, 
     "age": 25, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-AML-AMR-", 
     "positionText": "Midfielder", 
     "playedPositionsShort": "AM(LR)", 
     "teamId": 211, 
     "teamName": "Brighton", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "fr", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.6722222222222216, 
     "ranking": 2, 
     "apps": 18, 
     "subOn": 1, 
     "minsPlayed": 1471, 
     "manOfTheMatch": 5, 
     "yellowCard": 4.0, 
     "redCard": 0.0, 
     "goal": 6, 
     "assistTotal": 0, 
     "shotsPerGame": 2.3888888888888888, 
     "aerialWonPerGame": 0.22222222222222221, 
     "passSuccess": 83.420593368237348 
    }, 
    { 
     "name": "Lewis Dunk", 
     "firstName": "Lewis", 
     "lastName": "Dunk", 
     "playerId": 86441, 
     "height": 192, 
     "weight": 88, 
     "age": 25, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 211, 
     "teamName": "Brighton", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.660000000000001, 
     "ranking": 3, 
     "apps": 18, 
     "subOn": 0, 
     "minsPlayed": 1620, 
     "manOfTheMatch": 3, 
     "yellowCard": 8.0, 
     "redCard": 0.0, 
     "goal": 1, 
     "assistTotal": 1, 
     "shotsPerGame": 0.61111111111111116, 
     "aerialWonPerGame": 3.5, 
     "passSuccess": 79.72251867662753 
    }, 
    { 
     "name": "Tom Clarke", 
     "firstName": "Tom", 
     "lastName": "Clarke", 
     "playerId": 133974, 
     "height": 180, 
     "weight": 77, 
     "age": 28, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 181, 
     "teamName": "Preston", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.6126315789473677, 
     "ranking": 4, 
     "apps": 19, 
     "subOn": 0, 
     "minsPlayed": 1692, 
     "manOfTheMatch": 4, 
     "yellowCard": 0.0, 
     "redCard": 0.0, 
     "goal": 2, 
     "assistTotal": 0, 
     "shotsPerGame": 0.89473684210526316, 
     "aerialWonPerGame": 5.4736842105263159, 
     "passSuccess": 66.666666666666657 
    }, 
    { 
     "name": "Pontus Jansson", 
     "firstName": "Pontus", 
     "lastName": "Jansson", 
     "playerId": 121123, 
     "height": 194, 
     "weight": 89, 
     "age": 25, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 19, 
     "teamName": "Leeds", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "se", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.5976923076923066, 
     "ranking": 5, 
     "apps": 13, 
     "subOn": 0, 
     "minsPlayed": 1126, 
     "manOfTheMatch": 1, 
     "yellowCard": 6.0, 
     "redCard": 0.0, 
     "goal": 1, 
     "assistTotal": 0, 
     "shotsPerGame": 0.53846153846153844, 
     "aerialWonPerGame": 3.5384615384615383, 
     "passSuccess": 86.336633663366342 
    }, 
    { 
     "name": "Angus MacDonald", 
     "firstName": "Angus", 
     "lastName": "MacDonald", 
     "playerId": 110825, 
     "height": 184, 
     "weight": 70, 
     "age": 24, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 142, 
     "teamName": "Barnsley", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.5066666666666677, 
     "ranking": 6, 
     "apps": 12, 
     "subOn": 0, 
     "minsPlayed": 1080, 
     "manOfTheMatch": 0, 
     "yellowCard": 3.0, 
     "redCard": 0.0, 
     "goal": 0, 
     "assistTotal": 0, 
     "shotsPerGame": 0.33333333333333331, 
     "aerialWonPerGame": 4.833333333333333, 
     "passSuccess": 72.147651006711413 
    }, 
    { 
     "name": "Marc Roberts", 
     "firstName": "Marc", 
     "lastName": "Roberts", 
     "playerId": 138949, 
     "height": 183, 
     "weight": 81, 
     "age": 26, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 142, 
     "teamName": "Barnsley", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.503125, 
     "ranking": 7, 
     "apps": 16, 
     "subOn": 0, 
     "minsPlayed": 1440, 
     "manOfTheMatch": 1, 
     "yellowCard": 3.0, 
     "redCard": 0.0, 
     "goal": 2, 
     "assistTotal": 2, 
     "shotsPerGame": 0.625, 
     "aerialWonPerGame": 7.0625, 
     "passSuccess": 61.595547309833023 
    }, 
    { 
     "name": "Bradley Johnson", 
     "firstName": "Bradley", 
     "lastName": "Johnson", 
     "playerId": 12490, 
     "height": 178, 
     "weight": 68, 
     "age": 29, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-MC-ML-", 
     "positionText": "Midfielder", 
     "playedPositionsShort": "M(CL)", 
     "teamId": 20, 
     "teamName": "Derby", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.4954545454545443, 
     "ranking": 8, 
     "apps": 11, 
     "subOn": 0, 
     "minsPlayed": 952, 
     "manOfTheMatch": 1, 
     "yellowCard": 4.0, 
     "redCard": 0.0, 
     "goal": 2, 
     "assistTotal": 1, 
     "shotsPerGame": 1.3636363636363635, 
     "aerialWonPerGame": 4.0909090909090908, 
     "passSuccess": 71.908127208480565 
    }, 
    { 
     "name": "Christophe Berra", 
     "firstName": "Christophe", 
     "lastName": "Berra", 
     "playerId": 8287, 
     "height": 186, 
     "weight": 81, 
     "age": 31, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 165, 
     "teamName": "Ipswich", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-sct", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.4789473684210526, 
     "ranking": 9, 
     "apps": 19, 
     "subOn": 0, 
     "minsPlayed": 1710, 
     "manOfTheMatch": 3, 
     "yellowCard": 4.0, 
     "redCard": 0.0, 
     "goal": 0, 
     "assistTotal": 1, 
     "shotsPerGame": 0.94736842105263153, 
     "aerialWonPerGame": 6.2105263157894735, 
     "passSuccess": 58.636363636363633 
    }, 
    { 
     "name": "Adam Webster", 
     "firstName": "Adam", 
     "lastName": "Webster", 
     "playerId": 109922, 
     "height": 191, 
     "weight": 0, 
     "age": 21, 
     "isManOfTheMatch": false, 
     "isActive": true, 
     "isOpta": true, 
     "playedPositions": "-DC-", 
     "positionText": "Defender", 
     "playedPositionsShort": "D(C)", 
     "teamId": 165, 
     "teamName": "Ipswich", 
     "seasonId": 6365, 
     "seasonName": "2016/2017", 
     "tournamentId": 7, 
     "tournamentRegionId": 252, 
     "tournamentRegionCode": "gb-eng", 
     "regionCode": "gb-eng", 
     "tournamentName": "Championship", 
     "tournamentShortName": "EC", 
     "rating": 7.4780000000000006, 
     "ranking": 10, 
     "apps": 15, 
     "subOn": 1, 
     "minsPlayed": 1227, 
     "manOfTheMatch": 2, 
     "yellowCard": 1.0, 
     "redCard": 0.0, 
     "goal": 0, 
     "assistTotal": 0, 
     "shotsPerGame": 0.2, 
     "aerialWonPerGame": 5.0666666666666664, 
     "passSuccess": 58.256029684601117 
    }], 
    "paging": { 
     "currentPage": 1, 
     "totalPages": 34, 
     "resultsPerPage": 10, 
     "totalResults": 338, 
     "firstRecordIndex": 1, 
     "lastRecordIndex": 10 
    }, 
    "statColumns": ["apps", 
    "subOn", 
    "minsPlayed", 
    "goal", 
    "assistTotal", 
    "yellowCard", 
    "redCard", 
    "shotsPerGame", 
    "passSuccess", 
    "aerialWonPerGame", 
    "manOfTheMatch"] 
} 

没错!那么现在怎么办? 模拟这一请求和解析的内容,因为它是JSON格式化已经,内建模块json很容易做的工作,你甚至不必使用BeautifulSoup

也许你会问,为什么我什么也没得到,当我直接浏览此链接?这是因为他们在服务器上设置了限制,以便只有具有有效标题的请求才能获得提要。那么我怎么绕过这个呢? 使用正确的参数(主要是标题)“生动地”模拟,以便他们相信你。