如何计算R中区间内外的数据比例？

我有以下数据如何计算R中区间内外的数据比例？

Frequency = 260 



[1] -9.326550e-03 
    [2] -4.422175e-03 
    [3] 9.003794e-03 
    [4] -1.778217e-03 
    [5] -4.676712e-03 
    [6] 1.242704e-02 
    [7] 5.759863e-03

而且我想算多少，这些都在这两者之间：

Frequency = 260 



      [,1]   [,2] 
[1]   NA   NA 
[2] 0.010363147 -0.010363147 
[3] 0.010072569 -0.010072569 
[4] 0.010018997 -0.010018997 
[1] 0.009700522 -0.009700522 
[5] 0.009476024 -0.009476024 
[7] 0.009748085 -0.009748085

我不得不这样做在R，但我是一个初学者。在此先感谢！

来源

2015-05-09 user137425

你需要做的第一件事就是呈现一个R对象。该输出并不表明你还没有对R做过任何事情。这不是打印操作的典型结果。制作一个对象并在R代码中显示所需的间隔中断。 –

他们是我认为的时间系列对象 – user137425

你的范围总是对称零？在这种情况下，与绝对值比较将是最简单的。 – Frank

除非我误解 - 您希望第一个对象的第j个元素在第二个第j行的两个元素之间的次数是多少？如果是这样，

sum((data1 > data2[,1]) & (data1 < data2[,2]))/length(data1)

会这样做。

来源

2015-05-09 18:19:30

的输出一起呈现给我们嗯，我肯定读了OP的问题，我猜想他们只能澄清。 :) –

是的，我想要那个！谢谢！ – user137425

需要注意的一件事是，如果您的数据实际上确实存在NA值的第一个范围（或其他），那么@ Carl的解决方案将不起作用。你需要添加'na.rm = T'参数到sum：'sum（d> r [，1]＆d

下面是一个使用foverlaps从包装data.table，用下面的玩具数据集的一种方法：

library(data.table) 
## 
set.seed(123) 
ts1 <- data.table(
    ts(rnorm(50, sd = .1), frequency = 260))[ 
    ,V2 := V1] 
## 
ts2 <- cbind(
    ts(rnorm(50,-0.1,.5), frequency=260) 
    ,ts(rnorm(50,0.1,.5), frequency=260)) 
ts2 <- data.table(
    t(apply(ts2, 1, sort)))[ 
    1, c("V1", "V2") := NA] 
setkeyv(ts2, c("V1","V2"))

由于foverlaps从每个输入data.table S的需要两列，我们只是复制了第一列ts1（就我所知，这是惯例）。

fts <- foverlaps(
    x = ts1, y = na.omit(ts2) 
    ,type = "within")[ 
    ,list(Freq = .N) 
    ,by = "V1,V2"]

这对加入ts2ts1为ts1值中的每个ts2的[V1, V2]区间落在每一个发生 - 然后聚集由间隔获得计数。由于某些ts2的间隔可能包含零ts1值（这是此示例数据的情况）是可行的，因此您可以将汇总数据留在原始ts2对象上，并得出相应的比例：

(merge(x = ts2, y = fdt, all.x=TRUE)[ 
    is.na(Freq), Freq := 0][ 
    ,Inside := Freq/nrow(ts1)][ 
     ,Outside := 1 - Inside])[1:10,] 
## 
#   V1   V2 Freq Inside Outside 
# 1:   NA   NA 0 0.00 1.00 
# 2: -1.2545844 -0.37373731 0 0.00 1.00 
# 3: -0.9266236 -0.21024328 1 0.02 0.98 
# 4: -0.8743764 -0.29245223 0 0.00 1.00 
# 5: -0.7339710 0.19230687 50 1.00 0.00 
# 6: -0.7103589 0.13898042 50 1.00 0.00 
# 7: -0.7089414 -0.26660369 0 0.00 1.00 
# 8: -0.7007681 0.58032622 50 1.00 0.00 
# 9: -0.6860721 0.01936587 35 0.70 0.30 
# 10: -0.6573338 -0.41395304 0 0.00 1.00

来源

2015-05-09 18:08:41 nrussell

我认为@ nrussell的答案很好，但是您可以更简单地使用base R来完成您的答案，所以我会在此为您记录它，因为您说您是初学者。我已经评论过它，希望能帮助你了解发生了什么：

## Set a seed so simulated data can be duplicated: 
set.seed(2001) 

## Simulate your data to be counted: 
d <- rnorm(50) 

## Simulate your ranges: 
r <- rnorm(10) 
r <- cbind(r - 0.1, r + 0.1) 

## Sum up the values of d falling inside each row of ranges. The apply 
## function takes each row of r, and compares the values of d to the 
## bounds of your ranges (lower in the first column, upper in the second) 
## and the resulting logical vector is then summed, where TRUEs are equal 
## to 1, thus counting the number of values in d falling between each 
## set of bounds: 
sums <- apply(r, MARGIN=1, FUN=function(x) { sum(d > x[1] & d < x[2]) }) 

## Each item of the sums vector refers to the corresponding 
##  row of ranges in the r object...

来源

2015-05-09 18:22:38

如何计算R中区间内外的数据比例？

回答

相关问题