在R - VoidCC

中比较两个列表我有两个ID列表。在R

我想这两个列表比较，尤其是我喜欢下面的数字：

名单A和B都多少ID是
多少ID是在A，但不B中
多少ID是在B，而不是在一个

我也很想画维恩图。

来源

2013-07-11 Aslan986

见'?? intersect'和'?? setdiff' ... – agstudy

见[维恩图，其中R？]（http://stackoverflow.com/q/1428946/59470） – topchef

ISN”这是一个在R中不正确地使用术语“list”？这只是两个向量。这完全不一样。 – emilBeBri

这里有一些基本知识，尝试：

> A = c("Dog", "Cat", "Mouse") 
> B = c("Tiger","Lion","Cat") 
> A %in% B 
[1] FALSE TRUE FALSE 
> intersect(A,B) 
[1] "Cat" 
> setdiff(A,B) 
[1] "Dog" "Mouse" 
> setdiff(B,A) 
[1] "Tiger" "Lion"

同样，你可以得到数简称为：

> length(intersect(A,B)) 
[1] 1 
> length(setdiff(A,B)) 
[1] 2 
> length(setdiff(B,A)) 
[1] 2

来源

2013-07-11 16:24:48 Mittenchops

然而，一个又一个的方式，在％和布尔向量使用％的共同元素而不是相交和setdiff。我想你实际上要比较两个载体，而不是两个名单 - 一个列表是可以包含任何类型元素的R级，而矢量始终包含只是一个类型的元素，因此便于比较什么是真的平等。这里的元素被转换为字符串，因为这是当前最不灵活的元素类型。

first <- c(1:3, letters[1:6], "foo", "bar") second <- c(2:4, letters[5:8], "bar", "asd") both <- first[first %in% second] # in both, same as call: intersect(first, second) onlyfirst <- first[!first %in% second] # only in 'first', same as: setdiff(first, second) onlysecond <- second[!second %in% first] # only in 'second', same as: setdiff(second, first) length(both) length(onlyfirst) length(onlysecond) #> both #[1] "2" "3" "e" "f" "bar" #> onlyfirst #[1] "1" "a" "b" "c" "d" "foo" #> onlysecond #[1] "4" "g" "h" "asd" #> length(both) #[1] 5 #> length(onlyfirst) #[1] 6 #> length(onlysecond) #[1] 4 # If you don't have the 'gplots' package, type: install.packages("gplots") require("gplots") venn(list(first.vector = first, second.vector = second))

就像它被提到的那样，在R中绘制维恩图有多种选择。这里是使用gplots的输出。

来源

2013-07-11 16:45:36

我通常处理肥胖型套，所以使用一个表，而不是一个文氏图：

xtab_set <- function(A,B){ 
    both <- union(A,B) 
    inA  <- both %in% A 
    inB  <- both %in% B 
    return(table(inA,inB)) 
} 

set.seed(1) 
A <- sample(letters[1:20],10,replace=TRUE) 
B <- sample(letters[1:20],10,replace=TRUE) 
xtab_set(A,B) 

#  inB 
# inA  FALSE TRUE 
# FALSE  0 5 
# TRUE  6 3

来源

2013-07-11 16:53:02 Frank

啊，我没有意识到维恩图包含计数...我认为他们应该显示项目本身。 – Frank

随着sqldf：慢但非常适用于混合数据帧类型：

t1 <- as.data.frame(1:10) 
t2 <- as.data.frame(5:15) 
sqldf1 <- sqldf('SELECT * FROM t1 EXCEPT SELECT * FROM t2') # subset from t1 not in t2 
sqldf2 <- sqldf('SELECT * FROM t2 EXCEPT SELECT * FROM t1') # subset from t2 not in t1 
sqldf3 <- sqldf('SELECT * FROM t1 UNION SELECT * FROM t2') # UNION t1 and t2 

sqldf1 X1_10 
1 
2 
3 
4 
sqldf2 X5_15 
11 
12 
13 
14 
15 
sqldf3 X1_10 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13  
14 
15

来源

2014-06-22 23:32:23 rferrisx

在R

回答

相关问题