2016-09-20 47 views
1

我目前在使用R来比较特定矩阵内的每列时出现问题。我试图一次性比较每个整列,并通过table命令生成真假输出,然后将可以找到的trues数量转换为数值,并将这些值输入到它们各自的位置关联矩阵。R:比较读取表中的值并更新另一个矩阵

For example, I have data in this type of format: 
//Example state matrix - I am attempting to compare c1 with c2, then c1 with c3, then c1 with c4 and so on and so forth 
    c1 c2 c3 c4 
r1 2 6 3 2 
r2 1 1 6 5 
r3 3 1 3 6 

And I am trying to instead put it into this format 
//Example incidence matrix - Which is how many times c1 equaled c2 in the above matrix 
    c1 c2 c3 c4 
c1 3 1 1 1 
c2 1 3 0 0 
c3 1 0 3 0 
c4 1 0 0 3 

下面是代码我想出了这么远,但是,我不断收到此特定错误 -

警告消息: 在IncidenceMat [大鼠] [R] = IncidenceMat [大鼠] [ r] + as.numeric(实例):要替换的项目数不是替换长度的倍数
rawData = read.table("5-14-2014streamW636PPstate.txt") 
colnames = names(rawData) #the column names in R 
df <- data.frame(rawData) 
rats = ncol(rawData) 
instances = nrow(rawData) 

IncidenceMat = matrix(rep(0, rats), nrow = rats, ncol = rats) 

for(rat in rats) 
    for(r in rats) 
     if(rat == r){rawData[instance][rat] == rawData[instance][r] something like this would work in C++ if I attempted, 
     IncidenceMat[rat][r] = IncidenceMat[rat][r] + as.numeric(instances) 
    } else{ 
    count = df[colnames[rat]] == df[colnames[r]] 
    c = table(count) 
    TotTrue = as.numeric(c[2][1]) 
    IncidenceMat[rat][r] = IncidenceMat[rat][r] + TotTrue #count would go here #this should work like a charm as well 
    } 

任何帮助将不胜感激;我也看了一些这些资源,但是,我仍然难住

I tried thisand this以及我最近关闭的一些其他资源。

回答

1

这个怎么样(注意关联矩阵是对称的)?

df 
    c1 c2 c3 c4 
r1 2 6 3 2 
r2 1 1 6 5 
r3 3 1 3 6 

incidence <- matrix(rep(0, ncol(df)*ncol(df)), nrow=ncol(df)) 
diag(incidence) <- nrow(df) 
for (i in 1:(ncol(df)-1)) { 
    for (j in (i+1):ncol(df)) { 
    incidence[i,j] = incidence[j,i] = sum(df[,i] == df[,j]) 
    } 
} 

incidence 
    [,1] [,2] [,3] [,4] 
[1,] 3 1 1 1 
[2,] 1 3 0 0 
[3,] 1 0 3 0 
[4,] 1 0 0 3 
+0

This worked great! @sandipan你介意解释你的思维过程在这里吗? – zackymo21

+0

sure @ zackymo21。首先注意到关联矩阵的维数为n×n,其中n =原始矩阵的#列(因为我们需要比较每2列)。还要注意关联矩阵是对称的。所有的对角元素都将等于n(因为每个列自身都有相似的n个元素)。最后,我们需要计算列向量i和j之间的相似度,可以用sum(df [,i] == df [,j])完成。 –

+0

我明白了!非常感谢你的澄清!我还有一个问题,你能介意我的问题吗? @sandipan – zackymo21