计算我想获得很多（超过2）区间的联合众多间隔

联盟：计算我想获得很多（超过2）区间的联合众多间隔

df <- data.frame(id=c(1, 2, 3), 
      interval=c(
       new_interval(ymd("2001-01-01"), ymd("2002-01-01")), 
       new_interval(ymd("2001-01-01"), ymd("2004-01-01")), 
       new_interval(ymd("2001-02-01"), ymd("2002-01-01")) 
       )) 
df 
# id      interval 
# 1 1 2001-01-01 UTC--2002-01-01 UTC 
# 2 2 2001-01-01 UTC--2004-01-01 UTC 
# 3 3 2001-02-01 UTC--2002-01-01 UTC 

lubridate::union(lubridate::union(df$interval[1], df$interval[2]), 
       df$interval[3]) 
# [1] 2001-01-01 UTC--2004-01-01 UTC

这是正确的结果。

但是为什么lubridate::union不适用于Reduce？

Reduce(lubridate::union, df$interval) 
# [1] 31536000 94608000 28857600

间隔对象似乎被转换为数字太儿子（在应用union之前）。

来源

2015-10-05 user3808394

这将是巨大的，任何'lubridate'包的mantainers的可以提高它使用'Reduce'功能允许。我注册了一个新问题：https://github.com/hadley/lubridate/issues/348 – user3808394

仅供将来参考。如果您在问题仍未解决的情况下打开github问题，请在此问题中记下它，以便人们很容易意识到这一点。我回答了这个问题，但没有看到你的github问题的链接，这个问题在我提交答案之前已经关闭了。干杯。 –

为什么这不工作的原因是不Reduce()。相反，它是as.list()，当提供的x参数不是以列表开始时，它应用于x内部的Reduce()。相关的行是Reduce()中的第8行和第9行，如下所示。

head(Reduce, 9) 
# ...               
# 8  if (!is.vector(x) || is.object(x))     
# 9   x <- as.list(x)

if()条件的快速检查证实了这一点。

!is.vector(df$interval) || is.object(df$interval) 
# [1] TRUE

因此as.list()在您的来电Reduce()，这意味着df$interval上df$interval使用变得

as.list(df$interval) 
# [[1]] 
# [1] 31536000 
# 
# [[2]] 
# [1] 94608000 
# 
# [[3]] 
# [1] 28857600

在Reduce()任何重要的操作发生前（其实这是对我们而言最重要的操作）。这使得Reduce()输出合理;它会返回所有三个，因为它们是唯一的。

如果你真的需要使用Reduce()可以绕过列表检查，首先构建自己的列表，使用for()环路（如lapply()也将无法正常工作）。然后我们可以将其提供给Reduce()并获得适当的期望输出。

x <- vector("list", length(df$interval)) 
for(i in seq_along(x)) x[[i]] <- df$interval[i] 

Reduce(lubridate::union, x) 
# [1] 2001-01-01 UTC--2004-01-01 UTC

但它很可能是最好写的间隔类的as.list()方法，并把它在你的脚本的顶部。我们可以使用与上面相同的代码。

as.list.Interval <- function(x, ...) { 
    out <- vector("list", length(x)) 
    for(i in seq_along(x)) out[[i]] <- x[i] 
    out 
} 

Reduce(lubridate::union, df$interval) 
# [1] 2001-01-01 UTC--2004-01-01 UTC

另外请注意，您可以在此做的另一种方式，通过抓住起始插槽和使用int_end()。

interval(min(slot(df$interval, "start")), max(int_end(df$interval))) 
# [1] 2001-01-01 UTC--2004-01-01 UTC

来源

2015-10-06 05:36:39

非常感谢@ richard-scriven。 – user3808394

我不知道的情况下Reduce，但我会做这种方式：

library(dplyr) 
library(stringr) 

df %>% 
    mutate(interval = str_trim(str_replace_all(interval, "(--|UTC)", " ")), 
     int_start = word(interval), 
     int_end = word(interval, -1)) %>% 
    summarise(interval = str_c(min(int_start), 
          max(int_end), 
          sep = "--")) 
# result 
       interval 
1 2001-01-01--2004-01-01

来源

2015-10-05 09:35:00

因此，你有一个7行代码，它提供了与单行'lubridate :: union'相同的结果？ – 2015-10-05 10:05:34

@帕斯卡你不必喜欢我的答案。 –

无论如何它不回答这个问题。 – 2015-10-05 11:44:33

已刚刚解决的lubridate包 https://github.com/hadley/lubridate/issues/348

来源

2015-10-06 10:45:06 user3808394

计算我想获得很多（超过2）区间的联合众多间隔

回答

相关问题