2014-09-30 86 views
5

我有一个(对称)邻接矩阵,它是根据报纸文章(例如:a,b,c等)中名称(例如:Greg,Mary,Sam,Tom) d)。见下文。电梯价值计算

如何为非零矩阵元素(http://en.wikipedia.org/wiki/Lift_(data_mining))计算提升值

我会对有效的实现感兴趣,它也可以用于非常大的矩阵(例如,一百万个非零元素)。

我很感激任何帮助。

# Load package 
library(Matrix) 

# Data 
A <- new("dgTMatrix" 
    , i = c(2L, 2L, 2L, 0L, 3L, 3L, 3L, 1L, 1L) 
    , j = c(0L, 1L, 2L, 0L, 1L, 2L, 3L, 1L, 3L) 
    , Dim = c(4L, 4L) 
    , Dimnames = list(c("Greg", "Mary", "Sam", "Tom"), c("a", "b", "c", "d")) 
    , x = c(1, 1, 1, 1, 1, 1, 1, 1, 1) 
    , factors = list() 
) 

# > A 
# 4 x 4 sparse Matrix of class "dgTMatrix" 
#  a b c d 
# Greg 1 . . . 
# Mary . 1 . 1 
# Sam 1 1 1 . 
# Tom . 1 1 1 

# One mode projection of the data 
# (i.e. final adjacency matrix, which is the basis for the lift value calculation) 
A.final <- tcrossprod(A) 

# > A.final 
# 4 x 4 sparse Matrix of class "dsCMatrix" 
#  Greg Mary Sam Tom 
# Greg 1 . 1 . 
# Mary . 2 1 2 
# Sam  1 1 3 2 
# Tom  . 2 2 3 

回答

2

这是可以帮助你的东西,但肯定不是最有效的实现。

ComputeLift <- function(data, projection){ 
# Initialize a matrix to store the results. 
lift <- matrix(NA, nrow=nrow(projection), ncol=ncol(projection)) 
# Select all pairs in the projection matrix 
for(i in 1:nrow(projection)){ 
    for(j in 1:ncol(projection)){ 
     # The probability to observe both names in the same article is the 
     # number of articles where the names appear together divided by the 
     # total number of articles 
     pAB <- projection[i,j]/ncol(data) 
     # The probability for a name to appear in an article is the number of 
     # articles where the name appears divided by the total number of articles 
     pA <- sum(A[i,])/ncol(data) 
     pB <- sum(A[j,])/ncol(data) 
     # The lift is computed as the probability to observe both names in an 
     # article divided by the product of the probabilities to observe each name. 
     lift[i,j] <- pAB/(pA*pB) 
    } 
} 
lift 
} 

ComputeLift(data=A, projection=A.final)