2012-02-11 60 views
5

我想编写一些代码来获取给定的数据框,检查是否有任何列丢失,如果是,请添加缺少的列填充0或NA。下面是我得到了什么:R:找到丢失的列,如果丢失,添加到数据框

> df 
    x1 x2 x4 
1 0 1 3 
2 3 1 3 
3 1 2 1 

> nameslist <- c("x1","x2","x3","x4") 
> miss.names <- !nameslist %in% colnames(df) 
> holder <- rbind(nameslist,miss.names) 
> miss.cols <- subset(holder[1,], holder[2,] == "TRUE") 

除了这一点,我无法弄清楚如何在缺少列(“X3”)没有硬编码它添加。理想情况下,我希望新的完整数据框具有与nameslist相同顺序的列。

任何想法?我目前的代码可以忽略,没问题。

回答

14

这里有一个简单的方法

df <- data.frame(a=1:4, e=4:1) 
nms <- c("a", "b", "d", "e") # Vector of columns you want in this data.frame 

Missing <- setdiff(nms, names(df)) # Find names of missing columns 
df[Missing] <- 0     # Add them, filled with '0's 
df <- df[nms]      # Put columns in desired order 
# a b d e 
# 1 1 0 0 4 
# 2 2 0 0 3 
# 3 3 0 0 2 
# 4 4 0 0 1 
+1

你也可以使用'Missing < - setdiff(nms,names(df))',它稍微透明一些。 – 2012-02-11 07:12:43

+1

@HongOoi - 好的建议。这更好,我编辑了包含它的答案。谢谢! – 2012-02-11 07:25:30

1
library(stringr) 
df <- data.frame(X1=1:4,X2=1:4,X5=1:4) 
>df 
    X1 X2 X5 
1 1 1 1 
2 2 2 2 
3 3 3 3 
4 4 4 4 
current <- as.numeric(str_extract(names(df),"[0-9]")) 
missing <-seq(min(current),max(current)) 

df[paste("X",missing[!missing %in% current],sep="")]<-0 

>df[,order(colnames(df))] 
    X1 X2 X3 X4 X5 
1 1 1 0 0 1 
2 2 2 0 0 2 
3 3 3 0 0 3 
4 4 4 0 0 4 
0

谢谢你们,谢谢你我设法做到这一点与dataframes列表(文件)和colnames(ncolunas)另一个列表。

for (i in serieI) { 
    if ((identical(colnames(Files[[i]]),ncolunas)) == FALSE) { 

     nms = ncolunas 
      df = Files[[i]] 
      aux = colnames(df) 
      aux1 = row.names(df) 

      Missing = setdiff(nms, colnames(df)) 

      serie = seq(1,length(Missing)) #creating indices 1-5 for loop 
      for (j in serie) { #loop to add colums with zeros 
       df = cbind(df,c(0)) 
      } 
      colnames(df) = c(aux,Missing) #updates columns names 

      df = df[,order(colnames(df))] #put colums into order 
      df = t(as.matrix(df))   #hanges into matrix 
      row.names(df) = aux1   #update lines' names 
      Files[[i]] = df    #updates object from list 
    } 

}