2016-11-10 100 views
1

我试图从此page获得NBA球员统计。有一个用户界面按钮,允许您将数据表转换为csv,并且我试图自动执行此过程。在引擎盖下,它调用功能get_csv_output()与phantomjs调用函数给出不同于从控制台调用的结果

在检查控制台中,get_csv_output("per_game")get_csv_output("advanced")分别以csv格式输出#per_game#advanced表。

但是,当我尝试使用phantom.js调用get_csv_output()函数时,它仅提取“per_game”表的csv数据,但不适用于“高级”表。

var page = require('webpage').create(); 
page.open('http://www.basketball-reference.com/players/a/abdulka01.html', function() { 
    var result = page.evaluate(function() { 
    return get_csv_output("per_game"); 
    }); 
    console.log(result); 
    phantom.exit() 
}); 

的这个输出是CSV格式per_game表按预期方式。然而,当我尝试将其更改为get_csv_output("advanced")

输出Converting from PRE-Formatted to CSV does not work, please <span class=tooltip onClick="window.location.reload()">Reload</span> and then click CSV

我试图提供一些其他表的ID作为输入,并per_game似乎是唯一可行的。

回答

0

问题是解决了,现在它的工作原理:

function on_init (page){ 
page.viewportSize = {width:1600,height:900} 
page.evaluate(function(){ 
window.screen = {width:1600,height:900,availWidth:1600,availHeight:900}; 
window.innerWidth=1600; window.innerHeight=900; window.outerWidth=1600; window.outerHeight=900; 
window.navigator = { 
plugins: {length: 2, 'Shockwave Flash': {name: 'Shockwave Flash', filename: '/usr/lib/flashplugin-nonfree/libflashplayer.so', description: 'Shockwave Flash 11.2 r202', version: '11.2.202.440'}}, 
mimeTypes: {length: 2, "application/x-shockwave-flash": {description: "Shockwave Flash", suffixes: "swf", type: "application/x-shockwave-flash", enabledPlugin: {name: 'Shockwave Flash', filename: '/usr/lib/flashplugin-nonfree/libflashplayer.so', description: 'Shockwave Flash 11.2 r202', version: '11.2.202.440'}}}, 
appCodeName: "Mozilla", 
appName: "Netscape", 
appVersion: "5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36", 
cookieEnabled: 1, 
languages: "en-US,en", 
language: "en", 
onLine: 1, 
doNotTrack: null, 
platform: "Linux x86_64", 
product: "Gecko", 
vendor: "Google Inc.", 
vendorSub: "", 
productSub: 20030107, 
userAgent: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36", 
geolocation: {getCurrentPosition: function getCurrentPosition(){},watchPosition: function watchPosition(){},clearWatch: function clearWatch(){}}, 
javaEnabled: function javaEnabled(){return 0} };});}; 
var page = require('webpage').create(); 
page.onInitialized=function(){on_init(page)} 
page.open('http://www.basketball-reference.com/players/a/abdulka01.html', function() { 
    var result = page.evaluate(function() { 
    return get_csv_output("advanced"); 
    }); 
    console.log(result); 
    phantom.exit() 
}); 

./phantomjs test.js >>/dev/stdout

+3

能否请你解释一下你怎么知道做出这些变化,以及为什么他们有必要吗? – Mahir

+0

是的,我们需要至少改变'UserAgent',以使这个脚本起作用。随着我所做的更改,您将看到一个虚假的导航器对象,看起来像一个普通的浏览器。 – 2016-11-10 01:11:49

+0

本例中导航器对象的扩展版本: \t http://pastebin.com/kSndS8jX – 2016-11-10 01:16:44

相关问题