2017-05-25 90 views
0

我想用tidyJSON从JSON中提取信息,但是我对任何可以实现我目的的R包都是开放的。我看了一下文件和vignittes,发现complex example很有帮助。但是,我想要的信息嵌套在非键值对中,我不确定如何访问它。我感兴趣的是得到appidnamedeveloper等,但这些信息是内570730R:Web抓取JSON,从嵌套中提取信息

{"570":{"appid":570,"name":"Dota 2","developer":"Valve","publisher":"Valve","score_rank":71,"owners":102151578,"owners_variance":259003,"players_forever":102151578,"players_forever_variance":259003,"players_2weeks":9436299,"players_2weeks_variance":89979,"average_forever":11727,"average_2weeks":1229,"median_forever":277,"median_2weeks":662,"ccu":811259,"price":"0","tags":{"Free to Play":22678,"MOBA":7808,"Strategy":7415,"Multiplayer":6757,"Team-Based":4848,"Action":4602,"e-sports":4089,"Online Co-Op":3669,"Competitive":3553,"PvP":2655,"RTS":2267,"Difficult":2129,"RPG":2114,"Fantasy":2044,"Tower Defense":2024,"Co-op":1898,"Character Customization":1514,"Replay Value":1487,"Action RPG":1397,"Simulation":1024}}, 

"730":{"appid":730,"name":"Counter-Strike: Global Offensive","developer":"Valve","publisher":"Valve","score_rank":78,"owners":29225079,"owners_variance":154335,"players_forever":28552354,"players_forever_variance":152685,"players_2weeks":9102348,"players_2weeks_variance":88410,"average_forever":17648,"average_2weeks":791,"median_forever":5030,"median_2weeks":358,"ccu":543626,"price":"1499","tags":{"FPS":17082,"Multiplayer":13744,"Shooter":12833,"Action":10881,"Team-Based":10369,"Competitive":9664,"Tactical":8529,"First-Person":7329,"e-sports":6716,"PvP":6383,"Online Co-Op":5714,"Military":4621,"Co-op":4435,"Strategy":4424,"War":4361,"Realistic":3196,"Trading":3191,"Difficult":3158,"Fast-Paced":3100,"Moddable":2496}} 

有成千上万这样的条目。有没有一种方法可以跳过“顶级”并在窝内查找?
的JSON信息是从http://steamspy.com/api.php?request=top100in2weeks

+2

你可以试试'listviewer :: jsonedit'帮你先查看一下数据。可能[jsonlite](https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html)可帮助您提取所需内容。 – RobertMc

回答

1

这可能是你所需要的:

library(jsonlite) 
data = fromJSON("http://steamspy.com/api.php?request=top100in2weeks") 

appid = lapply(data, function(x){x$appid}) 
name = lapply(data, function(x){x$name}) 

df = data.frame(appid = unlist(appid), 
       name = unlist(name), 
       stringsAsFactors = F) 

结果:

> head(df) 
     appid        name 
570  570       Dota 2 
730  730 Counter-Strike: Global Offensive 
578080 578080 PLAYERUNKNOWN'S BATTLEGROUNDS 
440  440     Team Fortress 2 
271590 271590    Grand Theft Auto V 
433850 433850   H1Z1: King of the Kill 

我就让你添加的其余信息

编辑:将数组添加到数据框

可以在数据框中添加每个游戏的标签信息。时间标记为好。对于每个游戏,您必须在一列中存储一组标签名称,并在另一列中存储标签数量。

后的df定义添加下列行:

for(k in 1:nrow(d)){ 
    d$tags[k] = list(names(data[[k]]$tags)) 
    d$tagsQ[k] = list(unlist(data[[k]]$tags)) 
} 

这会给你:

> d["570",] 
    appid name 
570 570 Dota 2 

tags 
570 Free to Play, MOBA, Strategy, Multiplayer, Team-Based, Action, e-sports, Online Co-Op, Competitive, PvP, RTS, Difficult, RPG, Fantasy, Tower Defense, Co-op, Character Customization, Replay Value, Action RPG, Simulation 

tagsQ 
570 22686, 7810, 7420, 6759, 4850, 4603, 4092, 3672, 3555, 2657, 2267, 2130, 2116, 2045, 2024, 1898, 1514, 1487, 1397, 1023 

在这种情况下,列tagstagsQ包含列表。为了获得第二标签和数量appid 570做:

> df["570","tags"][[1]][2] 
[1] "MOBA" 

> d["570","tagsQ"][[1]][2] 
MOBA 
7810 
+0

谢谢。我也在努力将“标签”字段转换为可以放入数据框的数据结构。我最终得到一个无法插入数据框的命名列表。是否有一种简单的方法将标记转换为数据框中的虚拟布尔列,或者将数据连接成数据框字段中的逗号分隔值?我对列表结构非常不满。 – user2205916

+0

我试过了:http://www.r-tutor.com/r-introduction/list/named-list-members和https://stackoverflow.com/questions/32059798/list-of-named-lists-to数据框架和谷歌 – user2205916