2017-10-08 209 views
0

我有一个数据集,其中包含字符变量,某些变量仅为字符,但有些仅包含数字,我想用数字来更改它们。我如何使用基本功能来做到这一点?如何更改R数据集中的变量类型

test_data <- as.data.frame(list(
    V1 = c("-0.2372", "0.5231", "0.039", "1.618", "-1.0774"), 
    V2 = c("0.59", "0.7619", "1.7421", "-0.8037", "0.7327"), 
    V3 = c("0.3196", "0.5639", "-0.289", "-0.0822", "0.176"), 
    V4 = c("-1.2442", "0.2814", "-0.924", "0.9123", "-0.4972"), 
    V5 = c("ST 123E", "LD 34", "ST 123E", "ST 123E", "ST 123E"))) 

str(test_data) 
+1

test.data [] < - lapply(test.data,type.convert) – Roland

回答

0
test_data[,!sapply(test_data, function(x) all(grepl("[A-z]", x)))] <- lapply(test_data[,!sapply(test_data, function(x) all(grepl("[A-z]", as.character(x))))], as.numeric) 
+0

谢谢! “varhandle”包很容易,但我很有兴趣如何在基本语法中做到这一点 –

0

这是怎么回事?

test_data <- cbind(
    sapply(test_data[1:4], function(x) as.numeric(as.character(x))), 
    test_data[5]) 

test_data 
#  V1  V2  V3  V4  V5 
# 1 -0.2372 0.5900 0.3196 -1.2442 ST 123E 
# 2 0.5231 0.7619 0.5639 0.2814 LD 34 
# 3 0.0390 1.7421 -0.2890 -0.9240 ST 123E 
# 4 1.6180 -0.8037 -0.0822 0.9123 ST 123E 
# 5 -1.0774 0.7327 0.1760 -0.4972 ST 123E 

str(test_data) 
# 'data.frame': 5 obs. of 5 variables: 
# $ V1: num -0.237 0.523 0.039 1.618 -1.077 
# $ V2: num 0.59 0.762 1.742 -0.804 0.733 
# $ V3: num 0.3196 0.5639 -0.289 -0.0822 0.176 
# $ V4: num -1.244 0.281 -0.924 0.912 -0.497 
# $ V5: Factor w/ 2 levels "LD 34","ST 123E": 2 1 2 2 2 
+0

V5已保存字符类型 –

+0

我已经纠正了我的代码 - 希望现在适合你。 – jaySf

+0

谢谢!它的工作原理) –

0

这也适用

myfun <- function(I) { 
     if (any(grepl("[a-zA-Z]", I))) { # does column contain characters? 
      return(I) # if yes, return column as factor 
     } else { 
      return(as.double(I)) # if no, return as double 
     } 
    } 

df <- Reduce("data.frame", lapply(test_data, myfun)) 
colnames(df) <- c(LETTERS[1:5]) 

     # A  B  C  D  E 
# 1 -0.2372 0.5900 0.3196 -1.2442 ST 123E 
# 2 0.5231 0.7619 0.5639 0.2814 LD 34 
# 3 0.0390 1.7421 -0.2890 -0.9240 ST 123E 
# 4 1.6180 -0.8037 -0.0822 0.9123 ST 123E 
# 5 -1.0774 0.7327 0.1760 -0.4972 ST 123E 

str(df) 
# 'data.frame': 5 obs. of 5 variables: 
# $ A: num -0.237 0.523 0.039 1.618 -1.077 
# $ B: num 0.59 0.762 1.742 -0.804 0.733 
# $ C: num 0.3196 0.5639 -0.289 -0.0822 0.176 
# $ D: num -1.244 0.281 -0.924 0.912 -0.497 
# $ E: Factor w/ 2 levels "LD 34","ST 123E": 2 1 2 2 2