2015-02-05 103 views
0

我有一个data.table,其中有一些值为NA的因子列。我故意将NA作为因素的级别(即x <- factor(x, exclude=NULL),而不是默认行为x <- factor(x, exclude=NA)),因为这些NA对我的模型有意义。对于这些因子列,我希望relevel()为NA的参考水平,但我正在努力与语法。R - 将因子的参考水平设置为NA

# silly reproducible example 
library(data.table) 
a <- data.table(animal = c("turkey","platypus","dolphin"), 
      mass_kg = c(8, 2, 200), 
      egg_size= c("large","small",NA), 
      intelligent=c(0,0,1) 
      ) 
lr <- glm(intelligent ~ mass_kg + egg_size, data=a, family = binomial) 
summary(lr) 

# By default, egg_size is converted to a factor with no level for NA 
# However, in this case NA is meaningful (since most mammals don't lay eggs) 

a[,egg_size:=factor(egg_size, exclude=NULL) ] # exclude=NULL allows an NA level 

lr <- glm(intelligent ~ mass_kg + egg_size, data=a, family = binomial) 
summary(lr) # Now NA is included in the model, but not as the reference level 

a[,levels(egg_size)] # Returns: [1] "large" "small" NA  

a[,egg_size:=relevel(egg_size,ref=NA)] 
# Returns: 
# Error in relevel.factor(egg_size, ref = NA) : 
# 'ref' must be an existing level 

什么是relevel()的正确语法,还是我需要使用别的东西?非常感谢。

回答

1

您必须指定正确的NA类型,即NA_character_,但会抛出NA,这可能是一个错误。一种解决方法是直接指定自己的水平:

# throw out NA's to begin with 
egg_size = factor(c("large","small",NA), exclude = NA) 

# but then add them back at the beginning 
factor(egg_size, c(NA, levels(egg_size)), exclude = NULL) 
#[1] large small <NA> 
#Levels: <NA> large small 

如果你想知道,c转换NA为正确的类型,从logical

+0

很好的解决方法,谢谢。我曾尝试过'NA_character_',并想知道为什么它降低了关卡。 – C8H10N4O2 2015-02-05 18:59:24