我在这里问了一个问题Finding the index based on two data frames of strings,我得到了一个完美的答案。 现在我一直面临着另一个我无法解决的问题。如果我的第二个数据是多列,然后我就可以解决它的基础上根据不同长度的字符串操纵两个数据帧
setDT(strs)[, c('colids1','colids2') := lapply(.SD, function(x) toString(which(colSums(lut == x, na.rm=TRUE) > 0))), by = 1:nrow(strs)][]
只要这是确定作为我的第二个数据序列(STR)在所有列 长度相同,但如果他们改变(不相同的长度),那么这是行不通的,并给我一个错误。
所以我们说,我的第一个数据是
lut <- structure(list(V1 = c("O75663", "O95400", "O95433", NA, NA),
V2 = c("O95456", "O95670", NA, NA, NA), V3 = c("O75663",
"O95400", "O95433", "O95456", "O95670"), V4 = c("O95456",
"O95670", "O95801", "P00352", NA), V1 = c("O75663", "O95400",
"O95433", NA, NA), V2 = c("O95456", "O95670", NA, NA, NA),
V3 = c("O75663", "O95400", "O95433", "O95456", "O95670"),
V4 = c("O95456", "O95670", "O95801", "P00352", NA)), .Names = c("V1",
"V2", "V3", "V4", "V1", "V2", "V3", "V4"), row.names = c(NA,
-5L), class = "data.frame")
和我的第二个数据是
strs <- structure(list(strings = structure(c(2L, 3L, 4L, 5L, 6L, 7L,
1L, 1L), .Label = c("", "O75663", "O95400", "O95433", "O95456",
"O95670", "O95801"), class = "factor"), strings2 = structure(c(4L,
2L, 6L, 5L, 3L, 1L, 1L, 1L), .Label = c("", "O75663", "O95433",
"O95456", "P00352", "P00492"), class = "factor"), strings3 = structure(c(4L,
6L, 7L, 8L, 2L, 3L, 5L, 1L), .Label = c("", "O75663", "O95400",
"O95456", "O95670", "O95801", "P00352", "P00492"), class = "factor"),
strings4 = structure(c(2L, 5L, 3L, 4L, 1L, 1L, 1L, 1L), .Label = c("",
"O95400", "O95456", "O95801", "P00492"), class = "factor"),
strings5 = structure(c(8L, 2L, 7L, 1L, 3L, 6L, 5L, 4L), .Label = c("O75663",
"O95400", "O95433", "O95456", "O95670", "O95801", "P00352",
"P00492"), class = "factor")), .Names = c("strings", "strings2",
"strings3", "strings4", "strings5"), class = "data.frame", row.names = c(NA,
-8L))
这就是我试图做
df<- setDT(strs)[, paste0('colids_',seq_along(strs)) := lapply(.SD, function(x) toString(which(colSums(lut == x, na.rm=TRUE) > 0))), by = 1:nrow(strs)][]
它的工作原理,如果长度strs是相同的,但它不起作用,当长度变化时,我给这里的例子
错误很明显。试试这个'strs [c(1:3,5)] < - lapply(strs [c(1:3,5)],as.character)'然后运行你的'data.table'语句。由此产生的'df'是否符合您的期望? – Sumedh
@Sumedh谢谢你的消息,它不能解决问题。我做了你所说的然后我做了df < - setDT(strs)[,paste0('colids _',seq_along(strs)):= lapply(.SD,function(x)toString(which(colSums(lut == x,na.rm = TRUE)> 0))),by = 1:nrow(strs)] []然后得到同样的错误。 – nik
@Sumedh我一直在尝试在网络上提供的每一个评论,但我不知道为什么它不工作! – nik