0
我有一个对称矩阵,我需要按列列出子集和,并根据列表应用函数并将函数应用于每个子集。我如何加快流程或改进流程?通过列表并应用函数来列表矩阵
我当前的代码是类似这样的:
funs <- function(x, y, data) {
if (all(colnames(data) %in% x) & all(colnames(data) %in% y)) {
mean(data[x, y])
} else if (any(colnames(data) %in% x) & any(colnames(data) %in% y)) {
mean(data[colnames(data) %in% x, colnames(data) %in% y])
} else{
NA
}
}
vfuns <- Vectorize(funs, vectorize.args = c("x", "y"))
outer(l, l, vfuns, data = mat)
2 9 10 15 16 18
2 0.2277186 NA NA NA NA NA
9 NA NA NA NA NA NA
10 NA NA NA NA NA NA
15 NA NA NA NA NA NA
16 NA NA NA NA NA NA
18 NA NA NA NA NA NA
在早期版本的我计算每个组合的矩阵,但这种方式最终计算两次(或更多)的一些比较,是相当缓慢的。通过这种方式,我也计算了两次比较结果funs("2", "9", data = mat) == funs("9", "2", data = mat)
,但不是更多。我想提高性能的东西:
- “告诉”外面的结果是对称的:怎么样?
- 将列表转换为环境以加快查找速度(
Error: attempt to replicate an object of type 'environment'
) - 并行外部?
- ??
列表:
l <- structure(list(`2` = c("109582", "114608", "140837", "140877",
"1474228", "1474244", "162582", "194315", "194840", "76002",
"76005"), `9` = c("1430728", "156580", "156582", "211859"), `10` = c("1430728",
"156580", "156582", "211859"), `15` = c("1430728", "209776",
"209931", "71291"), `16` = c("379716", "379724", "74160"), `18` = c("112310",
"112315", "112316", "888590", "916853")), .Names = c("2", "9",
"10", "15", "16", "18"))
矩阵:
mat <- structure(c(1, 0.305084745762712, 0.0728051391862955, 0.151950718685832,
0.035778175313059, 0.128755364806867, 0.157080523601745, 0.127659574468085,
0.0452173913043478, 0.591549295774648, 0.32089552238806, 0.305084745762712,
1, 0.102040816326531, 0.186440677966102, 0.0421052631578947,
0.127272727272727, 0.0306691449814126, 0.0232558139534884, 0.00970873786407767,
0.6, 0.970059880239521, 0.0728051391862955, 0.102040816326531,
1, 0.62962962962963, 0.0317460317460317, 0.0225563909774436,
0.00383141762452107, 0.00546448087431694, 0.0140845070422535,
0.0970873786407767, 0.0970873786407767, 0.151950718685832, 0.186440677966102,
0.62962962962963, 1, 0.0273972602739726, 0.041958041958042, 0.00759013282732448,
0.00518134715025907, 0., 0.150442477876106, 0.178861788617886,
0.035778175313059, 0.0421052631578947, 0.0317460317460317, 0.0273972602739726,
1, 0.608938547486033, 0.0284403669724771, 0.0131004366812227,
0.00854700854700855, 0.0402684563758389, 0.041025641025641, 0.128755364806867,
0.127272727272727, 0.0225563909774436, 0.041958041958042, 0.608938547486033,
1, 0.0491379310344828, 0.0133779264214047, 0.0053475935828877,
0.10958904109589, 0.13134328358209, 0.157080523601745, 0.0306691449814126,
0.00383141762452107, 0.00759013282732448, 0.0284403669724771,
0.0491379310344828, 1, 0.288429752066116, 0.11384335154827, 0.111504424778761,
0.0333796940194715, 0.127659574468085, 0.0232558139534884, 0.00546448087431694,
0.00518134715025907, 0.0131004366812227, 0.0133779264214047,
0.288429752066116, 1, 0.527426160337553, 0.0780669144981413,
0.0229885057471264, 0.0452173913043478, 0.00970873786407767,
0.0140845070422535, 0., 0.00854700854700855,
0.0053475935828877, 0.11384335154827, 0.527426160337553, 1, 0.0636942675159236,
0.00947867298578199, 0.591549295774648, 0.6, 0.0970873786407767,
0.150442477876106, 0.0402684563758389, 0.10958904109589, 0.111504424778761,
0.0780669144981413, 0.0636942675159236, 1, 0.625454545454545,
0.32089552238806, 0.970059880239521, 0.0970873786407767, 0.178861788617886,
0.041025641025641, 0.13134328358209, 0.0333796940194715, 0.0229885057471264,
0.00947867298578199, 0.625454545454545, 1), .Dim = c(11L, 11L
), .Dimnames = list(c("109582", "114608", "140837", "140877",
"1474228", "1474244", "162582", "194315", "194840", "76002",
"76005"), c("109582", "114608", "140837", "140877", "1474228",
"1474244", "162582", "194315", "194840", "76002", "76005")))
虽然这肯定是我的问题的错误,这个问题本身是关于做交运集团 – Llopis
所有元素的两两比较看看编辑的答案在那里我将延伸到成对比较。你也曾要求改进,我提出你最初编写'%colnames(mat)'的索引是不正确的,应该是'%x'中的'colnames(mat)%,并且if语句不是必需的,并且您只需使用funs < - 函数(x,y,数据)平均值(%x中的数据[colnames(数据)%,%y])'中的NaN而不是NA得到相似的结果。 – Djork
外部已经做了我想要的,如何使用两个嵌套sapply调用更快更好? – Llopis