2016-12-07 717 views
2

我的表中有3列。而且我想要计算每个用户名的时间顺序,value等于B连续多少次。类似于具有相同值的最长子列表。例如,下面计算在Hive/SQL中连续出现值的次数

time userid value 2016-01-01 1 A 2016-01-02 1 B 2016-01-03 1 B 2016-01-04 2 C 2016-01-05 2 B 2016-01-06 2 B 2016-01-07 2 B 2016-01-08 2 C 2016-01-09 2 B

数据将返回

userid times 1 2 2 3

这甚至可能没有蜂巢用户自定义函数?我已经挖掘了一点LAGLEAD,但找不到方法。 :(

回答

1
select  value 
      ,userid    
      ,max (times) as times 


from  (select  value 
         ,userid 
         ,count (*) as times 

      from  (select value 
           ,userid 

           ,row_number() over 
           (
            partition by userid  
            order by  time 
           ) as rn 

           ,row_number() over 
           (
            partition by userid,value 
            order by  time 
           ) as rn_val 

         from t 

        -- where value = 'B' 
         ) t 

      group by value 
         ,userid 
         ,rn - rn_val 
      ) t 

group by value 
      ,userid 

order by value 
      ,userid 
;