2016-10-02 45 views
0

下表考虑蜂房如何计算比Hive表中的特定行更小的行?

+------+------+ 
| id | res | 
+------+------+ 
| 1 | 55 | 
| 2 | 10 | 
| 3 | 89 | 
| 4 | 100 | 
| 5 | 80 | 
| 6 | 55 | 
| 7 | 70 | 
| 8 | 35 | 
| 9 | 46 | 
| 10 | 51 | 
+------+------+ 

现在,我必须计算其比在一个特定的行中的RE值较小的行数。

对于上述表中的输出应该是

+------+------+ 
| id |count | 
+------+------+ 
| 1 | 4 | 
| 2 | 0 | 
| 3 | 8 | 
| 4 | 9 | 
| 5 | 7 | 
| 6 | 4 | 
| 7 | 6 | 
| 8 | 1 | 
| 9 | 2 | 
| 10 | 3 | 
+------+------+ 

回答

3

你可以试试RANK OVER功能。

样品Hiveql

select 
    id, 
    res, 
    rank() over (ORDER BY res) as rank 
from 
    my_table 
order by 
    res 

更多herehere

+0

???您的查询返回:'(2,10,1)(8,35,2)(9,46,3)(10,51,4)(1,55,5)(6,55,5)(7, 70,7)(5,80,8)(3,89,9)(4,100,10)',这不是想要的结果。 您是否运行了查询?如果我对这个主题感兴趣,请让我来。 – ozw1z5rd

+0

@ ozw1z5rd我能看到的唯一区别就是起始索引。 'Rank'返回从1开始的索引。其余部分都是一样的。 – Ambrish

+0

完美!我错过了它,它没有交叉产品。 – ozw1z5rd

0

瞧”

+-----+------+ 
| id | _c1 | 
+-----+------+ 
| 1 | 4 | 
| 2 | 0 | 
| 3 | 8 | 
| 4 | 9 | 
| 5 | 7 | 
| 6 | 4 | 
| 7 | 6 | 
| 8 | 1 | 
| 9 | 2 | 
| 10 | 3 | 
+-----+------+ 

这很容易,自该查询不交叉的产品它的疯狂。当然,对于每一行,你必须找到所有具有较小值的行,看起来像交叉产品的东西是隐含的。

SELECT id, SUM(IF (c.res1 > c.res2, 1 , 0)) 
FROM ( 
    SELECT id, a.res AS res1, b.res AS res2 
    FROM test_4 AS a 
     INNER JOIN ( 
      SELECT res 
      FROM test_4 
     ) b 
) c 
GROUP BY id; 
0

你可以做以下,但要记得,因为我们不检查<从结果从排名删除1 =但<(按顺序的话,我们并不排斥计数当前行)

select 
id, 
res, 
rank() over (ORDER BY res) -1 as rank 
FROM point 

ORDER BY id 

或者很长的路要走:

由于Hive不支持CTE(它基于SQL-92标准),我们将不得不使用子查询。

假设:我调用了包含ID和RES As POINT的表。

Select id, sum(comparison) as count 
From (

Select 
a.id, 
a.res as res1, 
b.res as res2, 
Case when a.res > b.res then 1 
Else 0 
End as comparison 

FROM point a 
CROSS JOIN point b 
) c 

GROUP BY id 

请测试并让我知道。

0

排名可能是要走的路,但这里是一个有趣的选择:

SELECT  mt.id    AS id 
      , mt.res   AS res 
      , COUNT(1) OVER (PARTITION BY NULL ORDER BY mt.res ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - 1 AS cnt 
FROM  my_table mt