2017-04-04 40 views
0

上下文:根据来自另一个表的组比例在表中创建组

我有两个表。表A中有数据与实施例格式(具有12 有序组AL,A =最高,L =最低):

ID | BAND 
---- | ---- 
1 | A  
2 | B 
3 | A 
4 | C 
5 | D 
6 | F 
7 | D 
8 | H 
... 

表B具有与示例格式数据:

ID | SCORE 
---- | ---- 
1 | 0.12  
2 | 0.37 
3 | 0.21 
4 | 0.55 
5 | 0.01 
6 | 0.90 
7 | 0.10 
8 | 0.71  
... 

我已经计算出每个组中的表A的大小成比例使用:

CREATE TABLE table_a_group_pct AS 
SELECT band 
, count(*) * 100.0/sum(count(*)) over() AS pct 
FROM table_a 
GROUP BY band; 

随着输出:

BAND | PCT 
---- | ---- 
A | 12 
B | 15 
C | 11 
D | 9 
E | 10 
F | 8 
G | 11 
H | 10 
I | 6 
J | 4 
K | 3 
L | 1 

我希望创建12排序(按分数)基团为表B具有相同的成比例的大小的组列于表A.

例如表A中有12%的行有group = A,那么前12%的行(基于分数)会被赋予group = A,依次类推......

我想我可以通过使用NTILE(100)函数查找每个得分的%位置,然后使用CASE WHEN基于表A中每个组的累积%创建手动组。(即,如果Band A具有最高的12%ID,则我找到第88个百分点表B做:

CASE WHEN score_pct > 88 then 'A' 
    WHEN score_pct BETWEEN 88 and 73 then 'B' ... 
END AS group` 

但是我想了解是否有解决这个问题的更聪明的方式

其他信息。 n: 表A &表B不是相同的大小,并没有完全相同的ID,我只是想创建类似比例的组。

我的预期输出是这样的:

ID | SCORE | BAND 
---- | ---- | ---- 
1 | 0.12 | K/11 
2 | 0.37 | G/7 
3 | 0.21 | H/8 
4 | 0.55 | E/5 
5 | 0.01 | L/12 
6 | 0.90 | A/1 
7 | 0.10 | K/11 
8 | 0.71 | B/2 

[编辑我的问题补充清晰度]

+0

你能包括预期的输出吗? –

+0

不...编辑你的问题,我们无法阅读你的想法 –

+0

不知何故,这个问题对我来说没有意义。你首先说有* 12 *有序的带,然后显示带* 8 *行的表。也许你应该问*另一个问题,这个问题是你的问题的简化版本和更清晰的解释。例如,什么是“表A的比例尺寸”? –

回答

1

这可以通过使用CUME_DIST解析函数的一些时髦的加盟一起实现(在预12c)像这样:

(NB我修改了table_a中的数据,以便它包含前8个等级;这与您的示例数据不匹配,所以当我的输出与您的输出不匹配时不要感到惊讶)

WITH table_a AS (SELECT 1 ID, 'A' band FROM dual UNION ALL 
       SELECT 2 ID, 'B' band FROM dual UNION ALL 
       SELECT 3 ID, 'A' band FROM dual UNION ALL 
       SELECT 4 ID, 'C' band FROM dual UNION ALL 
       SELECT 5 ID, 'D' band FROM dual UNION ALL 
       SELECT 6 ID, 'E' band FROM dual UNION ALL 
       SELECT 7 ID, 'D' band FROM dual UNION ALL 
       SELECT 8 ID, 'F' band FROM dual), 
    table_b AS (SELECT 1 ID, 0.12 score FROM dual UNION ALL 
       SELECT 2 ID, 0.37 score FROM dual UNION ALL 
       SELECT 3 ID, 0.21 score FROM dual UNION ALL 
       SELECT 4 ID, 0.55 score FROM dual UNION ALL 
       SELECT 5 ID, 0.01 score FROM dual UNION ALL 
       SELECT 6 ID, 0.90 score FROM dual UNION ALL 
       SELECT 7 ID, 0.10 score FROM dual UNION ALL 
       SELECT 8 ID, 0.71 score FROM dual), 
-- end of data set-up, see the rest of the query below: 
     a_pc AS (SELECT DISTINCT band, 
         cume_dist() OVER (ORDER BY band) pc_cume_dist 
       FROM table_a), 
     b_pc AS (SELECT id, 
         score, 
         cume_dist() OVER (ORDER BY score DESC) pc_cume_dist 
       FROM table_b) 
SELECT b_pc.id, 
     b_pc.score, 
     b_pc.pc_cume_dist, 
     min(a_pc.band) band 
FROM b_pc 
     INNER JOIN a_pc ON (a_pc.band = CASE WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'A' THEN 'A' 
              WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'B' THEN 'B' 
              WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'C' THEN 'C' 
              WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'D' THEN 'D' 
              WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'E' THEN 'E' 
              WHEN b_pc.pc_cume_dist <= a_pc.pc_cume_dist AND a_pc.band = 'F' THEN 'F' 
             END) 
GROUP BY b_pc.id, b_pc.score, b_pc.pc_cume_dist 
ORDER BY b_pc.score DESC; 

     ID  SCORE PC_CUME_DIST BAND 
---------- ---------- ------------ ---- 
     6  0.9  0.125 A 
     8  0.71   0.25 A 
     4  0.55  0.375 B 
     2  0.37   0.5 C 
     3  0.21  0.625 D 
     1  0.12   0.75 D 
     7  0.1  0.875 E 
     5  0.01   1 F 

或者,12C可以使用LATERAL加入,就像这样:

WITH table_a AS (SELECT 1 ID, 'A' band FROM dual UNION ALL 
       SELECT 2 ID, 'B' band FROM dual UNION ALL 
       SELECT 3 ID, 'A' band FROM dual UNION ALL 
       SELECT 4 ID, 'C' band FROM dual UNION ALL 
       SELECT 5 ID, 'D' band FROM dual UNION ALL 
       SELECT 6 ID, 'E' band FROM dual UNION ALL 
       SELECT 7 ID, 'D' band FROM dual UNION ALL 
       SELECT 8 ID, 'F' band FROM dual), 
    table_b AS (SELECT 1 ID, 0.12 score FROM dual UNION ALL 
       SELECT 2 ID, 0.37 score FROM dual UNION ALL 
       SELECT 3 ID, 0.21 score FROM dual UNION ALL 
       SELECT 4 ID, 0.55 score FROM dual UNION ALL 
       SELECT 5 ID, 0.01 score FROM dual UNION ALL 
       SELECT 6 ID, 0.90 score FROM dual UNION ALL 
       SELECT 7 ID, 0.10 score FROM dual UNION ALL 
       SELECT 8 ID, 0.71 score FROM dual), 
     a_pc AS (SELECT DISTINCT band, 
         cume_dist() OVER (ORDER BY band) pc_cume_dist 
       FROM table_a), 
     b_pc AS (SELECT id, 
         score, 
         cume_dist() OVER (ORDER BY score DESC) pc_cume_dist 
       FROM table_b) 
SELECT b_pc.id, 
     b_pc.score, 
     b_pc.pc_cume_dist, 
     a_pc2.band 
FROM b_pc, 
     lateral (SELECT MIN(band) band 
       FROM a_pc 
       WHERE a_pc.pc_cume_dist >= b_pc.pc_cume_dist) a_pc2 
order by b_pc.score desc 

     ID  SCORE PC_CUME_DIST BAND 
---------- ---------- ------------ ---- 
     6  0.9  0.125 A 
     8  0.71   0.25 A 
     4  0.55  0.375 B 
     2  0.37   0.5 C 
     3  0.21  0.625 D 
     1  0.12   0.75 D 
     7  0.1  0.875 E 
     5  0.01   1 F 

这是它在甲骨文的LiveSQL (which is at version 12.2)运行的一个例子。

+0

使用cume_dist函数表示感谢! – tfcoe

相关问题