对于两种情况,我有以下数据集w
和关键变量x
。二进制搜索像概念在R中创建子集数据
Case 1:
x = 4
w = c(1,2,4,4,4,4,6,7,8,9,10,11,12,14,15)
Case2:
x = 12
w = c(1,2,4,4,4,4,6,7,8,9,10,11,12,14,15)
我想创建这将为x
通过搜索数据集w
,将在w
子集原始数据集大小的数据集下按x
的位置的功能。输出将是具有与搜索关键字相同的上限值的较小大小的数据集。下面是我想中的R写入功能:
create_chunk <- function(val, tab, L=1L, H=length(tab))
{
if(H >= L)
{
mid = L + ((H-L)/2)
## If the element is present within middle length
if(tab[mid] > val)
{
## subset the original data in reduced size and again do mid position value checking
## then subset the data
} else
{
mid = mid + (mid/2)
## Increase the mid position to go for right side checking
}
}
}
在输出我要寻找如下:
Output for Case 1:
Dataset containing: 1,2,4,4,4,4
Output for Case 2:
Dataset containing: 1,2,4,4,4,4,6,7,8,9,10,11,12
Please note:
1. Dataset may contain duplicate values for search key and
all the duplicate values are expected in the output dataset.
2. I have huge size datasets (say around 2M rows) from
where I am trying to subset smaller dataset as per my requirement of search key.
新更新:案例3
输入数据:
date value size stockName
1 2016-08-12 12:44:43 10093.40 4 HWA IS Equity
2 2016-08-12 12:44:38 10093.35 2 HWA IS Equity
3 2016-08-12 12:44:47 10088.00 2 HWA IS Equity
4 2016-08-12 12:44:52 10089.95 1 HWA IS Equity
5 2016-08-12 12:44:53 10089.95 1 HWA IS Equity
6 2016-08-12 12:44:54 10088.95 1 HWA IS Equity
搜索关键字是:10089.95
in value colu MN。
预期成果是:
date value size stockName
1 2016-08-12 12:44:47 10088.00 2 HWA IS Equity
2 2016-08-12 12:44:54 10088.95 1 HWA IS Equity
3 2016-08-12 12:44:52 10089.95 1 HWA IS Equity
4 2016-08-12 12:44:53 10089.95 1 HWA IS Equity
你自己的功能有什么问题? – 989
我没有获得第二个数据集的成功。如果匹配变量存在,我也希望提供关于选择重复值的建议。 – Zico
看起来你正在寻找'?findInterval' - 'w [seq_len(findInterval(4,w))]' –