2017-08-09 368 views
0

我想查看我的变量之间是否存在相关性。这是数据集的结构图表。与连续变量和分类变量的相关性

'data.frame': 189 obs. of 20 variables: 
$ age   : num 24 31 32 35 36 26 31 24 35 36 ... 
$ diplM2   : Factor w/ 3 levels "0","1","2": 3 2 1 3 2 2 3 2 2 1 ... 
$ TimeDelcat  : Factor w/ 4 levels "0","1","2","3": 1 1 3 3 3 4 2 1 4 4 ... 
$ SeasonDel  : Factor w/ 4 levels "1","2","3","4": 1 2 4 3 4 3 4 3 2 3 ... 
$ BMIM2   : num 23.4 25.7 17 26.6 24.6 21.6 21 22.3 20.8 20.7 ... 
$ WgtB2   : int 3740 3615 3705 3485 3420 2775 3365 3770 3075 3000 ... 
$ sex   : Factor w/ 2 levels "1","2": 2 2 1 2 2 2 1 1 1 1 ... 
$ smoke   : Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 1 1 3 ... 
$ nRBC   : num 0.1621 0.0604 0.1935 0.0527 0.1118 ... 
$ CD4T   : num 0.1427 0.2143 0.1432 0.0686 0.0979 ... 
$ CD8T   : num 0.1574 0.1549 0.1243 0.0804 0.0782 ... 
$ NK    : num 0.02817 0 0.04368 0.00641 0.02398 ... 
$ Bcell   : num 0.1033 0.1124 0.1468 0.0551 0.0696 ... 
$ Mono   : num 0.0633 0.0641 0.0773 0.0531 0.0656 ... 
$ Gran   : num 0.428 0.442 0.329 0.716 0.6 ... 
$ chip   : Factor w/ 92 levels "200251580021",..: 12 24 23 2 27 22 6 22 17 22 ... 
$ pos   : Factor w/ 12 levels "R01C01","R01C02",..: 11 12 1 6 9 2 12 1 7 11 ... 
$ trim1PM25ifdmv4: num 9.45 13.81 15.59 7.13 15.43 ... 
$ trim2PM25ifdmv4: num 13.27 15.53 10.69 13.56 9.27 ... 
$ trim3PM25ifdmv4: num 16.72 16.21 12.17 6.47 10.66 ... 

正如您所看到的那样,存在连续变量和分类变量。 当我运行chart.Correlation(variables, histrogram=T,method = c("pearson"))

我得到这个错误:

Error in pairs.default(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor, : 
    non-numeric argument to 'pairs' 

我怎样才能解决这个问题? 谢谢。

+0

无法使用因子数据类型计算相关性。 – Lstat

+0

如果我使用来自library psych的pairs.panels,它与因子变量一起工作。 @Lstat – Julie

+0

您将如何计算非数字输入的相关系数r,如“R01C01”?我不知道'心理'图书馆如何处理这个问题。然而,函数'chart.Correlation'使用两个函数来估计r:'cor'和'cor.test'(你可以在'fix(chart.Correlation)'中找到它)。他们都不接受因素作为输入数据。 – Lstat

回答

0

我相信你只想在数值变量之间进行相关。下面的代码将执行此操作,它将仅输出输入之间的唯一关联。

library(reshape2) 
data <- data.frame(x1=rnorm(10), 
      x2=rnorm(10), 
      x3=rnorm(10), 
      x4=c("a","b","c","d","e","f","g","h","i","j"), 
      x5=c("ab","sp","sp","dd","hg","hj","qw","dh","ko","jk")) 

data 
     x1   x2   x3  x4 x5 
1 -1.2169793 0.5397598 0.4981513 a ab 
2 -0.7032631 -2.1262837 -1.0377371 b sp 
3 0.8766831 -0.2326975 -0.1219613 c sp 
4 0.3405332 2.4766225 -1.1960618 d dd 
5 0.1889945 0.3444534 1.9659062 e hg 
6 0.8086956 0.4654644 -1.2526696 f hj 
7 -0.6850181 -1.7657241 0.5156620 g qw 
8 0.8518034 0.9484547 1.4784063 h dh 
9 0.5191793 1.2246566 1.3867829 i ko 
10 0.4568953 -0.6881464 0.3548839 j jk 

#finding correlation for all numerical values 
corr=cor(data[as.numeric(which(sapply(data,class)=="numeric"))]) 
#convert the correlation table to long format 
res=melt(corr) 
##keeping only one side of the correlations 
res$type=apply(res,1,function(x) 
paste(sort(c(as.character(x[1]),as.character(x[2]))),collapse="*")) 
res=unique(res[,c("type","value")]) 

res 
type  value 
x1*x1 1.00000000 
x1*x2 0.44024939 
x1*x3 0.04936654 
x2*x2 1.00000000 
x2*x3 0.08859169 
x3*x3 1.00000000