相似度指数：

我想计算一个相似性指标，以便当行'simialr时得到+1，而当它们不是时则为-1。

dataR<- read.table(text=' 
    echant espece 
    ech1 esp1 
    ech2 esp2 
    ech3 esp2 
    ech4 esp3 
    ech5 esp3 
    ech6 esp4 
    ech7 esp4', header=TRUE)

我想获得一个这样的矩阵（或NA的诊断，它并没有真正的问题）

嗯，我试过代理封装功能simil

library(proxy)  
trst<-read.table("Rtest_simil.csv",header=T,sep=",",dec=".") 
    is.numeric(trst[,2]) 
    as.numeric(trst[,2]) #the column "espece" becomes numeric 
    sim<-simil(trst,diag=TRUE)

但结果并不是我想要的。 1）例如ech 2和3之间的相似度为0.5，diagonale为0;当没有相似性时，它也是0. 2）ech的标签丢失 3）... additionnaly，我无法将其保存为.csv格式。

有没有人有一个建议吗？非常感谢！

来源

2016-08-21 catindri

有无疑更紧凑的方式来做到这一点：

same.mat <- outer(dataR$espece, dataR$espece, "==") * 2 - 1

要指定名称的列和行为：

library(tidyr) 
same <- function(x) { ifelse(is.na(x), -1, 1) } 
spread(dataR, espece, espece) %>% 
    mutate_at(vars(-echant), funs(same)) 
## echant esp1 esp2 esp3 esp4 
## 1 ech1 1 -1 -1 -1 
## 2 ech2 -1 1 -1 -1 
## 3 ech3 -1 1 -1 -1 
## 4 ech4 -1 -1 1 -1 
## 5 ech5 -1 -1 1 -1 
## 6 ech6 -1 -1 -1 1 
## 7 ech7 -1 -1 -1 1

来源

2016-08-21 13:14:17 hrbrmstr

非常感谢，我需要提高R中progaming ... – catindri

在帖子中描述的矩阵可以用获得在后文中描述可以使用rownames和colnames。

rownames(same.mat) <- colnames(same.mat) <- dataR$echant 
> same.mat 
#  ech1 ech2 ech3 ech4 ech5 ech6 ech7 
#ech1 1 -1 -1 -1 -1 -1 -1 
#ech2 -1 1 1 -1 -1 -1 -1 
#ech3 -1 1 1 -1 -1 -1 -1 
#ech4 -1 -1 -1 1 1 -1 -1 
#ech5 -1 -1 -1 1 1 -1 -1 
#ech6 -1 -1 -1 -1 -1 1 1 
#ech7 -1 -1 -1 -1 -1 1 1

另一种做法可能是：

same.mat <- (as.matrix(dist(as.numeric(dataR$espece)))==0)*2 - 1 
rownames(same.mat) <- colnames(same.mat) <- dataR$echant

来源

2016-08-21 15:27:07 RHertel

谢谢你，它的工作原理也 – catindri

相似度指数：

回答

相关问题