2017-09-13 114 views
2

我有一个包含数字和因子变量组合的数据框。使用NA替换数据框中所有列的所有异常值

我试图递归替换NA但是我在与以下错误

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric 
问题都异常(3×SD)

我所用的代码是

name = factor(c("A","B","NA","D","E","NA","G","H","H")) 
height = c(120,NA,150,170,NA,146,132,210,NA) 
age = c(10,20,0,30,40,50,60,NA,130) 
mark = c(100,0.5,100,50,90,100,NA,50,210) 
data = data.frame(name=name,mark=mark,age=age,height=height) 
data 
data[is.na(data)] <- 77777 
data.scale <- scale(data) 
data.scale[ abs(data.scale) > 3 ] <- NA 
data <- data.scale 

任何关于如何使这项工作的建议?

+1

包括[可重现的示例](http://stackoverflow.com/questions/5963269)将使其他人更容易帮助你。 – Jaap

+2

如果你正在讨论异常值,那么你的变量不应该是一个因子 –

+1

你正在一个数据框上进行数学应用,这个数据框上只包含数值。使用'data = data.frame(mark = mark,age = age,height = height)',不带'name'列。运行代码的其余部分,并在末尾添加'data <-cbind(name,data)'行。 – Smich7

回答

1

这里有一个办法:

library(dplyr) 

# take note of order for column names 
data.names <- colnames(data) 

# scale all numeric columns 
data.numeric <- select_if(data, is.numeric) %>% # subset of numeric columns 
    mutate_all(scale)        # perform scale separately for each column 
data.numeric[data.numeric > 3] <- NA   # set values larger than 3 to NA (none in this example) 

# combine results with subset data frame of non-numeric columns 
data <- data.frame(select_if(data, function(x) !is.numeric(x)), 
        data.numeric) 

# restore columns to original order 
data <- data[, data.names] 

> data 
    name  mark   age  height 
1 A 0.20461856 -0.80009469 -1.0844636 
2 B -1.43232992 -0.55391171   NA 
3 NA 0.20461856 -1.04627767 -0.1459855 
4 D -0.61796862 -0.30772873 0.4796666 
5 E 0.04010112 -0.06154575   NA 
6 NA 0.20461856 0.18463724 -0.2711159 
7 G   NA 0.43082022 -0.7090723 
8 H -0.61796862   NA 1.7309707 
9 H 2.01431035 2.15410109   NA 

注:非数字(字符/因子/等),变量将在这种方法中,数字变量之前预订。因此,最后一步恢复原始订单(如果适用)。