2016-06-01 74 views
1

我有一个包含列A(BIGINT,示例客户帐户标识)和列B(日期,示例 - 上次购买日期)的50k行的表。根据SQL中的日期计算百分位数

我想知道百分之五十的消费者在最高25%的平铺,最高50%的平铺,给定日期范围的75%平铺中最后一次购物的情况,因此我可以根据所有这些客户帐户ID我们上次购买的大部分都是倾向于。任何想法如何在SQL中实现?

表:alltransations

ACCT_ID   | DATE 
----------------|--------------- 
23748234782947 | 05-15-2016 
28178792839838 | 05-01-2016 
28178092734538 | 02-12-2016 
28347732839867 | 01-15-2016 
28170909362959 | 10-10-2015 
28171334099090 | 11-11-2015 
28109129330023 | 12-25-2014 
28172377859289 | 10-31-2014 

回答

0

我不知道如果我得到你的权利与瓷砖,但如果你通过瓜分TIMERANGE分为四个区域意味着它会工作像这样从2016-区间02-01至2016-06-01。权衡:手动计算间隔;可能一个能做到这一点直通日期计算,也

CREATE TABLE tblA (ACCT_ID INTEGER, PDATE DATE); 

INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1000,'2016-05-21'); 
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1001,'2016-05-11'); 
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1002,'2016-05-24'); 
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1003,'2016-04-21'); 
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1004,'2016-02-12'); 
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1005,'2016-02-21'); 
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1001,'2016-03-22'); 
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1002,'2016-04-01'); 
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1005,'2016-04-01'); 
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1006,'2016-04-01'); 

    SELECT DISTR.DATE_RANGE, COUNT(DISTR.ACCT_ID)/OVRL.TOTALCNT 
FROM (SELECT 'TOP25' as DATE_RANGE, A.ACCT_ID 
     FROM tblA A 
     WHERE A.PDATE BETWEEN STR_TO_DATE('01.05.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y') 
     UNION ALL 
     SELECT 'TOP50' as DATE_RANGE, B.ACCT_ID 
     FROM tblA B 
     WHERE B.PDATE BETWEEN STR_TO_DATE('01.04.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y') 
     UNION ALL 
     SELECT 'TOP75' as DATE_RANGE, C.ACCT_ID 
     FROM tblA C 
     WHERE C.PDATE BETWEEN STR_TO_DATE('01.03.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y') 
     UNION ALL 
     SELECT 'ALL' as DATE_RANGE, C.ACCT_ID 
     FROM tblA C 
     WHERE C.PDATE BETWEEN STR_TO_DATE('01.02.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y')) DISTR 
, (SELECT COUNT(*) AS TOTALCNT FROM tblA A WHERE A.PDATE BETWEEN STR_TO_DATE('01.03.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y')) OVRL 
GROUP BY DISTR.DATE_RANGE, OVRL.TOTALCNT 

将提供

ALL 10 10 
TOP25 3 10 
TOP50 7 10 
TOP75 8 10 
0

该解决方案将动态地创建基于数据集的完整的日期范围的日期四分,然后显示ID的百分比发生在四分位数:

select unix_timestamp(min(date)) into @start from p; 
select unix_timestamp(max(date)) into @end from p; 
Set @25 = 0.25 *(@end - @start)[email protected]; 
Set @50 = 0.50 *(@end - @start)[email protected]; 
Set @75 = 0.75 *(@end - @start)[email protected]; 

SELECT 
CASE WHEN unix_timestamp(date)>@75 then 4 
WHEN unix_timestamp(date)>@50 then 3 
WHEN unix_timestamp(date)>@25 then 2 
ELSE 1 END as Quartile, 
round(count(id)/(select count(*) from p)*100,2) as Percentage 
FROM p 
GROUP BY Quartile; 

Here is a functional example有更多的细节和格式。

如果您的范围一开始有一半日期,一半结束,则只能动态地看到Q1和Q4。

首先将变量设置为范围,然后为每个四分位数或其他时间段分区分割变量。

CASE声明级联从大到小的日期,全部采用UNIX_TIMESTAMP格式,以便于算术运算,因为它在失败时从四分位数传递到四分位数。

可以使用相同的结构按段,n-tiles划分日期范围。