2017-03-09 58 views
3

我有以下类型的表:自与条件接合表

表dummy1:

e_n t_s item 
a  t1 c 
a  t2 c 
a  t3 c 
a  t4 c 
b  p1 c 
b  p2 c 
b  p3 c 
b  p4 c 

T1,T2,T3,T4,P1,P2,P3,P4是时间戳在升序。 t1,t2,t3,t4是event_name'a'的升序时间戳。 p1,p2,p3,p4是event_name'b'升序的时间戳。

c是发生这些事件'a'和'b'的item_number。

我试图写它的结果应该是作为查询如下:

e_n1 e_n2 item t_s_1 t_s_2 
a  b  c  t1 p1 
a  b  c  t2 p2 
a  b  c  t3 p3 
a  b  c  t4 p4 

我曾尝试下面的代码:

select l.e_n as e_n_1, m.e_n as e_n_2, l.item, l.t_s as t_s_a, 
m.t_s as t_s_b from (
(select * from dummy where e_n = 'a') l 
join 
(select * from dummy where e_n = 'b') m 
on l.item = m.item and l.t_s < m.t_s 

的加入l.item = m.item需要,因为有许多其他项目C1,C2,C3具有相同的结构

结果是:

e_n1 e_n2 item t_s_a t_s_b 
    a  b  c  t1 p1 
    a  b  c  t1 p2 
    a  b  c  t1 p3 
    a  b  c  t1 p4 
    a  b  c  t2 p1 
    a  b  c  t2 p2 
    a  b  c  t2 p3 

so on 

我如何以高效的方式实现我的结果?

+0

是你的apache-spark-sql支持ROW_NUMBER()OVER(ORDER BY t_s)rn?如果是,那么简单地使用'l.rn = m.rn'完全外部连接表'l'和'm' –

+0

这是专门针对Amazon Redshift的吗?还是Spark?您能否相应地澄清您的标签? –

+0

这是为apache-spark-sql – SpaceOddity

回答

3
select  min (case when e_n = 'a' then 'a' end) as e_n1 
      ,min (case when e_n = 'b' then 'b' end) as e_n2 
      ,item 
      ,min (case when e_n = 'a' then t_s end) as t_s_1 
      ,min (case when e_n = 'b' then t_s end) as t_s_2 

from  (select  d.* 
         ,row_number() over (partition by item,e_n order by t_s) as rn 

      from  dummy as d 
      ) d 

group by item 
      ,rn 

+------+------+------+-------+-------+ 
| e_n1 | e_n2 | item | t_s_1 | t_s_2 | 
+------+------+------+-------+-------+ 
| a | b | c | t1 | p1 | 
| a | b | c | t2 | p2 | 
| a | b | c | t3 | p3 | 
| a | b | c | t4 | p4 | 
+------+------+------+-------+-------+ 
+0

一种亲切的提醒来接受答案(通过标记** V **标记留给它) –

0

首先,排序时间戳每一个事件,然后加入对排序表中的行数。

请尝试下面的代码。

select l.e_n as e_n_1, m.e_n as e_n_2, isnull(l.item,m.item) as item, l.t_s as t_s_a, 
    m.t_s as t_s_b from 
    (select *,(row_number() over (order by t_s)) as rn from dummy where e_n = 'a') l 
    full join 
    (select *,(row_number() over (order by t_s)) as rn from dummy where e_n = 'b') m 
    on l.item = m.item and l.rn=m.rn