我使用一个返回数据帧列表的API(Rblpapi bdh()
函数)。我希望使用列表的names()
作为组合单个数据框中的新列来将数据放在整齐的格式中。我有一个解决方案,但它很容易出错,比我需要的慢,我怀疑。是否有更清晰的方式来整理数据框的列表?
#create example data set
library(tidyr)
obsA <- data_frame(
date = as.Date('2009-01-01') + 0:2,
X = rnorm(3, 0, 1),
Y = rnorm(3, 0, 2),
Z = rnorm(3, 0, 4)
)
obsB <- data_frame(
date = as.Date('2009-01-01') + 0:2,
X = rnorm(3, 10, 1),
Y = rnorm(3, 10, 2),
Z = rnorm(3, 10, 4)
)
obs<-list(obsA=obsA,obsB=obsB)
我可以很容易地创建一个单一的数据框,但它将单个列表名称放入唯一的行名称中。
#create single data frame
obs_long<-do.call("rbind",obs)
#don't like this
rownames(obs_long)
#[1] "obsA.1" "obsA.2" "obsA.3" "obsB.1" "obsB.2" "obsB.3"
names(obs_long)
#[1] "date" "X" "Y" "Z"
我可以拉出的行,使用一个正则表达式剥离添加的行标识符和mutate()
一个新列。
#Full solution but ungainly.
# Extra step to convert row names to a column. Risk of parsing error if
# a period is in item name.
tidy_obs<-do.call("rbind",obs) #%>%
mutate(item=str_extract(rownames(.),"[A-Za-z0-9 ]+"))%>%
select(date,item,everything())%>%
group_by(item)%>%arrange(date)
# > tidy_obs
# # A tibble: 6 x 5
# # Groups: item [2]
# date item X Y Z
# <date> <chr> <dbl> <dbl> <dbl>
# 1 2009-01-01 obsA -0.1030362 2.274885 -4.134265
# 2 2009-01-01 obsB 8.4210832 7.604203 13.449731
# 3 2009-01-02 obsA -0.2279141 -2.748717 4.372599
# 4 2009-01-02 obsB 12.8940563 10.594164 8.108275
# 5 2009-01-03 obsA 0.5749725 -4.041280 -0.524420
# 6 2009-01-03 obsB 10.1158769 12.684331 8.248651
这工作,但我想知道是否有避免的mutate()
额外的步骤和/或str_extract()
的分析异常的风险更直接的方式。谢谢!
如何改变你的'列表'稍微'cbind' datafram (obsA = cbind(obsA,item =“obsA”),obsB = cbind(obsB,item =“obsB”))'? W –