在我们的问题跟踪器中已经描述了类似的问题: https://github.com/eddelbuettel/digest/issues/33
当前版本的digest
可以读取一个文件来计算散列。
因此,至少在Linux上,我们可以用一个命名管道将由消化包被读取(在一个线程)和从另一端的数据将通过另一个线程被写入。
下面的代码段示出了我们如何能够从10号通过用1第一供给蒸煮计算MD5哈希:5和6:10然后。
library(parallel)
library(digest)
x <- as.character(1:10) # input
fname <- "mystream.fifo" # choose name for your named pipe
close(fifo(fname, "w")) # creates your pipe if does not exist
producer <- mcparallel({
mystream <- file(fname, "w")
writeLines(x[1:5], mystream)
writeLines(x[6:10], mystream)
close(mystream) # sends signal to the consumer (digester)
})
digester <- mcparallel({
digest(fname, file = TRUE, algo = "md5") # just reads the stream till signalled
})
# runs both processes in parallel
mccollect(list(producer, digester))
unlink(fname) # named pipe removed
UPDATE:Henrik Bengtsson提供基于期货的变形例:
library("future")
plan(multiprocess)
x <- as.character(1:10) # input
fname <- "mystream.fifo" # choose name for your named pipe
close(fifo(fname, open="wb")) # creates your pipe if does not exists
producer %<-% {
mystream <- file(fname, open="wb")
writeBin(x[1:5], endian="little", con=mystream)
writeBin(x[6:10], endian="little", con=mystream)
close(mystream) # sends signal to the consumer (digester)
}
# just reads the stream till signalled
md5 <- digest::digest(fname, file = TRUE, algo = "md5")
print(md5)
## [1] "25867862802a623c16928216e2501a39"
# Note: Identical on Linux and Windows
也许你可以散列在一次一列,例如'DT [,lapply(.SD,消化)] '。然后你检查每列的'hash'或者散列结果:'digest(dt [,lapply(.SD,digest)])''。 – nicola
@nicola非常感谢。如此简单而强大!完美的工作(稍加改进就是在每次调用摘要时调用'gc()'以确保未使用的内存实际上被释放) –