2016-03-25 25 views
2

我想在其他列中基于NAs创建另一列。下面是一个例子:根据其他列中的NA值创建新列

df <- replicate(5,rnorm(4))  
df[1,3:4] <- NA  
df[2:3,1:2] <- NA  
colnames(df)[1:5] <- c("One","Two","Three","Four","Five") 
df 
     One Two Three Four Five 
[1,] 0.12 -0.38 NA NA 0.10 
[2,] NA NA -0.19 -0.14 -1.57 
[3,] NA NA 1.01 0.22 0.27 
[4,] 0.53 0.71 -0.86 -0.33 -1.01 

每一列具有固定的分配的权重:

weightc1 <- 0.1  
weightc2 <- 0.3  
weightc3 <- 0.2  
weightc4 <- 0.35  
weightc5 <- 0.05` 

欲让在NAS中的每一列等于对应的列权重。例如。第1列中的NA为0.1。

然后,我想创建另一列(称之为Six),它等于NA权重的总和。例如,第6列的第一行应该是0.55(0.2 + 0.35)。最后一行,它没有来港,等于0柱应该是这样的:

df2 <- cbind(df, Six = c("0.55","0.4","0.4","0")) 
df2 
    One     Two     Three    Four     Five    Six 
[1,] "0.123127305724018" "-0.378163368890999" NA     NA     "0.100592613978267" "0.55" 
[2,] NA     NA     "-0.190601356688205" "-0.136015883223294" "-1.56573577576604" "0.4" 
[3,] NA     NA     "1.01441506421936" "0.220154629517149" "0.273740027540685" "0.4" 
[4,] "0.529632731861426" "0.709285638700681" "-0.864741163519668" "-0.327865814162575" "-1.01298096772074" "0" 

我试图IfesleSix < - ifelse(DF $一个== NA, “weightc1”, “”),它用NAs替换第一列中的所有数字。我知道在应用求和函数之前我需要先解决这个问题(或者是否有解决方法?)。请指教。谢谢!

+0

如果创建一个随机过程的示例(即'rnorm' ..)请使用'set.seed '重现性 –

回答

1

结果也可以与矩阵矢量乘积获得:

weights <- c(0.1,0.3,0.2,0.35,0.05) 
df2 <- cbind(df, Six=c(is.na(df) %*% weights)) 
#   One  Two  Three  Four  Five Six 
#[1,] 1.0103788 0.07835063   NA   NA -1.9312272 0.55 
#[2,]   NA   NA 1.4426233 -0.55698776 1.0897613 0.40 
#[3,]   NA   NA -0.3756296 -1.18399257 0.6567973 0.40 
#[4,] -0.1799107 0.46225181 1.3530630 0.09264794 -0.3004309 0.00 
+1

正是我需要的。非常感谢! –

+0

不客气。我很高兴我能提供帮助。 – RHertel

+0

我有一个后续问题。我想创建另一列,它等于列*权重的总和。这里是我的意思:'df2 [“Seven”] < - NA''df2 $ Seven < - sum(df2 $ One * weightc1,df2 $ Two * weightc2,df2 $ Three * weightc3,df2 $ Four * weightc4,df2 $五* weightc5,is.na = T)'它返回第七列的所有NA。我如何得到这个权利? –

1

我们得到了所有的“weightc”对象的值在list(使用mget)中,“DF”转换为data.frame,然后乘以的“weightc” list每个元件都具有“DF”的相应列(后将其转换为与is.na的逻辑矢量),并使用Reduce来获得总和。

Reduce(`+`,Map(function(x,y) y*is.na(x), 
    as.data.frame(df), mget(ls(pattern='weightc\\d+')))) 

或者我们可以后unlist ING乘以 'weightc' 的复制list逻辑矩阵(is.na(df)),并做rowSums

rowSums(unlist(mget(ls(pattern="weightc\\d+"))[col(df)])*is.na(df)) 
#[1] 0.55 0.40 0.40 0.00