2017-09-26 322 views
1

我有下表获取增量更新。我需要编写一个普通的Hive查询来合并具有相同键值和最新值的行。在Hive表上合并重复记录

Key | A | B | C | Timestamp 
K1 | X | Null | Null | 2015-05-03 
K1 | Null | Y | Z | 2015-05-02 
K1 | Foo | Bar | Baz | 2015-05-01 

想要得到的:

Key | A | B | C | Timestamp 
K1 | X | Y | Z | 2015-05-03 
+0

首先想到的 - 凝聚的,但我不认为,如果列是少,你可以尝试为蜂巢不会再支持CTE通话这是正确的 –

+0

创建新的CTE你必须创建一个新的桌子或修剪存储。然后我有一些soln .. –

回答

0

使用FIRST_VALUE()函数来获得持续不为空值。需要对排序键进行排序,因为last_value仅适用于一个排序键。

演示:

select distinct 
key, 
first_value(A) over (partition by Key order by concat(case when A is null then '1' else '2' end,'_',Timestamp)desc) A, 
first_value(B) over (partition by Key order by concat(case when B is null then '1' else '2' end,'_',Timestamp)desc) B, 
first_value(C) over (partition by Key order by concat(case when C is null then '1' else '2' end,'_',Timestamp)desc) C, 
max(timestamp) over(partition by key) timestamp 
from 
( ---------Replace this subquery with your table 
select 'K1' key, 'X' a, Null b, Null c, '2015-05-03' timestamp union all 
select 'K1' key, null a, 'Y'  b, 'Z' c, '2015-05-02' timestamp union all 
select 'K1' key, 'Foo' a, 'Bar' b, 'Baz' c, '2015-05-01' timestamp 
)s 
; 

输出:

OK 
key  a  b  c  timestamp 
K1  X  Y  Z  2015-05-03