2017-02-15 81 views
-1

我在一个数据框中有一列,我使用colsplit将其分成三个单独的列。R扁平列表列

df <- transform(df, concatenation = colsplit(concatenation, pattern="->-", 
names = c('att1', 'att2','att3', 'att4'))) 

OR

df$concatenation <- colsplit(concatenation, pattern="->-", 
names = c('att1', 'att2','att3', 'att4'))) 

concatenation 
a->-a->-b->-c 
b->-a->-b->-d 
3->-a->-x->-c 
2->-a->-y->-8 

现在我有以下几列,concatenation.att1,concatenation.att2等

concatenation.att1 concatenation.att2 concatenation.att3 concatenation.att4 
a     a     b     c 
b     a     b     d 
3     a     x     c 
2     a     y     8 

当试图导出这个数据帧到CSV我得到的以下错误:

Error in ncol(xj) : object 'xj' not found 

OR

Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) : 
    missing value where TRUE/FALSE needed 

从研究我已经推断,这是从我的嵌套列,但是我找不到出口到CSV一个简单的方法来拉平数据框(如下)。

att1 att2 att3 att4 
a a b c 
b a b d 
3 a x c 
2 a y 8 

目前我重新分配数据到合适的水平,并删除堆叠列,但我相信有一个更好的方式来做到这一点。

df$att1 <- df$concatenation$att1 
df$att2 <- df$concatenation$att2 
df$att3 <- df$concatenation$att3 
df$att4 <- df$concatenation$att4 

df$concatenation <- NULL 

下面是一个可重复的例子:

#read in table 
df <- read.table(textConnection(
    "concatenation  Value 
AFG->-Afghanistan->-1950->-True 20,249 
    AFG->-Afghanistan->-1951->-True 21,352 
    AFG->-Afghanistan->-1952->-True 22,532 
    AFG->-Afghanistan->-1953->-True 23,557 
    AFG->-Afghanistan->-1954->-True 24,555 
    ALB->-Albania->-1950->-True 8,097 
    ALB->-Albania->-1951->-True 8,986"), header=TRUE) 

#Split concatenation var 
df <- transform(df, concatenation = colsplit(concatenation, pattern="->-", 
              names = c('att1', 'att2','att3', 'att4'))) 
#write to csv 
write.csv(df, "myfile.csv") 
+1

*我在一个数据框中有一列,我使用colsplit将它分成三个单独的列* ......很高兴看到列值。 *我无法找到一个简单的方法来平整数据帧* ...这将是很好的看到所需的输出。 – Parfait

+0

我已经在表中添加了预期的输出。希望这可以让它更清晰 – sdhaus

回答

1

貌似tidyr::separate将做到这一点。

nm <- c('att1', 'att2','att3', 'att4') 
df2 <- tidyr::separate(df, concatenation, nm, sep = "->-") 

sapply(df2, typeof) 
#  att1  att2  att3  att4  Value 
# "character" "character" "character" "character" "integer" 
write.csv(df2) 
# "","att1","att2","att3","att4","Value" 
# "1","AFG","Afghanistan","1950","True","20,249" 
# "2","AFG","Afghanistan","1951","True","21,352" 
# "3","AFG","Afghanistan","1952","True","22,532" 
# "4","AFG","Afghanistan","1953","True","23,557" 
# "5","AFG","Afghanistan","1954","True","24,555" 
# "6","ALB","Albania","1950","True","8,097" 
# "7","ALB","Albania","1951","True","8,986" 

而在基地R,strsplit()将工作。

df3 <- do.call(rbind.data.frame, strsplit(as.character(df$concatenation), "->-")) 
cbind(setNames(df3, nm), df["Value"]) 
+0

谢谢,这就是我一直在寻找的。 – sdhaus

1

为什么你需要在这里变换?试试这个:

df$concatenation <- colsplit(df$concatenation, "->-", 
        names = c("att1", "att2","att3", "att4")) 
+0

没错,我只是在没有转换的情况下运行它,它确实产生了相同的结果。但是,写入CSV时仍然会导致错误。 - if(inherit(X [[j]],“data.frame”)&& ncol(xj)> 1L)X [[j]] < - as.matrix(X [[j]])中的错误: 缺少TRUE/FALSE所需的值 – sdhaus