这里有一个数据帧我可以想到的解决方案可以让你得到你所需要的包装和逐行处理:
承担df
看起来像这样利用read.csv
和stringsAsFactors = FALSE
:
df
Married Transportation Color
1 YES {"Company":"GTS","Type":"Limo"} White
2 {"Driver":"John"} Green
3 NO {"Type":"Van","Driver":"John"}
你可以这样做:
library(jsonlite)
l <- lapply(df$Transportation, fromJSON)
n <- unique(unlist(sapply(l, names)))
df[, n] <- lapply(n, function(x) sapply(l, function(y) y[[x]]))
为了得到这个:
如果
df
Married Transportation Color Company Type Driver
1 YES {"Company":"GTS","Type":"Limo"} White GTS Limo NULL
2 {"Driver":"John"} Green NULL NULL John
3 NO {"Type":"Van","Driver":"John"} NULL Van John
不知道还有一个更高效办法。
EDIT基于添加的信息涉及畸形JSON在实际数据中
在情况下,存在在Transportation
列中的原始格式不正确的JSON,这里是解决它的一种方法:
原始数据帧如下:
df <- read.table(text = 'Married,Transportation,Color
YES,"{""Company"":""GTS"",""Type"":""Limo""}",White
,"{""Driver"":""John""}",Green
NO,"{""Type"":""Van"",""Driver"":""John""}",',
header = TRUE, sep = ',', stringsAsFactors = FALSE)
行结合和额外的行与畸形JSON一个额外的““”字符:
df <- rbind(df, data.frame(Married = 'NO',
Transportation = '{"Company": ""GTLS"}',
Color = 'Red'))
新的df看起来是这样的(见第4行畸形的JSON):
Married Transportation Color
1 YES {"Company":"GTS","Type":"Limo"} White
2 {"Driver":"John"} Green
3 NO {"Type":"Van","Driver":"John"}
4 NO {"Company": ""GTLS"} Red
现在,用这个来获取所有嵌套的JSON为单独列:
l <- lapply(df$Transportation, function(x) tryCatch({fromJSON(x)}, error = function(e) NA))
n <- unique(unlist(sapply(l, names)))
df[, n] <- lapply(n, function(x)
sapply(l, function(y)
if (!is.null(names(y))) y[[x]]))
输出作为如下:
Married Transportation Color Company Type Driver
1 YES {"Company":"GTS","Type":"Limo"} White GTS Limo NULL
2 {"Driver":"John"} Green NULL NULL John
3 NO {"Type":"Van","Driver":"John"} NULL Van John
4 NO {"Company": ""GTLS"} Red NULL NULL NULL
为什么你这么反对解析? – hrbrmstr
@hrbrmstr我只是不认为解析是一个有效的方法。我大概有30名不同的JSON对象,他们的键/值是不同的顺序等 – user8010356