2014-10-05 60 views
0

我有一个复杂的oracle视图,它返回在返回的行内具有逻辑重复的数据。我的目标是当基于两列(文本和日期时间)找到重复项时只检索一行,但要根据第三列(日期时间)决定要返回哪一个重复项。基于三列删除结果集重复

我有模拟的结果低于设定到表中与存根数据(如发现here上SQLFiddle):

CREATE TABLE TimeTable (
    ID number NOT NULL, 
    NAME VARCHAR2(20) NOT NULL,  -- Grouped by this first 
    TARGETVALUE INT,     -- ultimate target value to be returned (no precedence from this value) 
    NOTE VARCHAR2(20) NULL,   -- Just a note for the developer on StackOverflow 
    BEGIN_DATE TIMESTAMP NOT NULL, -- Grouped by this 2nd (down to the minute, not seconds) 
    APPROVAL_DATE TIMESTAMP NOT NULL -- Decides the ties for duplicates 

); 

insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values 
(1, 'Alpha', 5, 'Duplicate First', '08-MAR-14 09.43.00.000000000', 
            '09-MAR-14 09.43.00.000000000'); 

insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values 
(2, 'Alpha', 2, 'Duplicate Middle', '08-MAR-14 09.43.00.000000000', 
            '09-MAR-14 09.43.00.000000000'); 


insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values 
(3, 'Alpha', 3, 'Final Target', '08-MAR-14 09.43.00.000000000', 
           '09-MAR-14 10.00.00.000000000'); 

-- Same time as alpha, but not related. 
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values 
(4, 'Beta', 4, 'Only Target', '08-MAR-14 09.43.30.000000000', 
           '09-MAR-14 11.00.30.000000000'); 

其中需要将是2行的

3, 'Alpha', 3, '08-MAR-14 09.43.00.000000000', '09-MAR-14 10.00.00.000000000' 
4, 'Beta', 4, '08-MAR-14 09.43.30.000000000' '09-MAR-14 11.00.30.000000000' 

结果集澄清的注意事项,如果我在数据库中有这个值

5, 'Alpha', 8, '09-MAR-14 09.43.00.000000000', '12-MAR-14 10.00.00.000000000' 

然后t帽子阿尔法设置将是唯一的,并且也返回,因为它不被认为是重复的,因为不同的BEGIN_DATE(这是3月9日而不是8日)。


这里有规律可循

  1. NAME涉及的数据。
  2. BEGIN_DATE是第二个关系,其中准确时间到分钟将有重复,需要根据#3删除重复。
  3. 如果每个#1和#2有重复,那么它们将被删除,由最近的APPROVAL_DATE确定,这将在早期日期赢得
+1

无关,而是:你不应该依赖隐式数据类型转换。如果使用来自具有不同NLS设置的客户端计算机,则使用诸如''08 -MAR-14 09.43.00.000000000''这样的字符文字将无法可靠地工作。使用适当的ANSI时间戳文字或使用'to_timestamp()'函数使用月份编号,而不是名称。 – 2014-10-05 11:42:00

+0

@a_horse_with_no_name(*嘿,我们都是70年代的琐事问题*)感谢提示,Oracle不是我的专长,但我正在学习。我需要一个SO的简单例子,时间戳只是一个方便。但你的建议很好。谢谢。 – OmegaMan 2014-10-06 14:09:33

回答

2

这应该是一个简单的实现的ANALYTICS聚集基于提到规则的数据。

您需要NAME, BEGIN_DATE的每个组中的MAXAPPROVAL DATE。所以,你需要做的是:

MAX(APPROVAL_DATE) OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) max_appr_dt 

而且,在你的外部查询,只是滤除DUPLICATESPREDICATE使用WHERE APPROVAL_DATE = max_aapr_dt

注意PERFORMANCE的角度来看,这种做法会做一个TABLE SCAN只有一次。因此,除了加入表,并具有如意见要求多次表扫描

更新添加完整的测试用例的另一种方法更好

有使用分析方法有两种:

1.MAX

SQL> SELECT * 
    2 FROM 
    3 (SELECT A.*, 
    4  MAX(APPROVAL_DATE) OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) max_appr_dt 
    5 FROM TIMETABLE A 
    6 ) 
    7 WHERE approval_date = max_appr_dt 
    8/

     ID NAME     TARGETVALUE NOTE     BEGIN_DATE      APPROVAL_DATE     MAX_APPR_DT 
---------- -------------------- ----------- -------------------- ------------------------------ ------------------------------ ------------------------------ 
     3 Alpha       3 Final Target   08-MAR-14 09.43.00.000000 AM 09-MAR-14 10.00.00.000000 AM 09-MAR-14 10.00.00.000000 AM 
     4 Beta       4 Only Target   08-MAR-14 09.43.30.000000 AM 09-MAR-14 11.00.30.000000 AM 09-MAR-14 11.00.30.000000 AM 

2.ROW_NUMBE R()

SQL> SELECT * 
    2 FROM 
    3 (SELECT a.*, 
    4  row_number() OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) AS "RNK" 
    5 FROM TIMETABLE A 
    6 ) 
    7 WHERE rnk =1 
    8/

     ID NAME     TARGETVALUE NOTE     BEGIN_DATE      APPROVAL_DATE       RNK 
---------- -------------------- ----------- -------------------- ------------------------------ ------------------------------ ---------- 
     3 Alpha       3 Final Target   08-MAR-14 09.43.00.000000 AM 09-MAR-14 10.00.00.000000 AM   1 
     4 Beta       4 Only Target   08-MAR-14 09.43.30.000000 AM 09-MAR-14 11.00.30.000000 AM   1 

执行计划两种查询:

SQL> EXPLAIN PLAN FOR 
    2 SELECT * 
    3 FROM 
    4 (SELECT A.*, 
    5  MAX(APPROVAL_DATE) OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) max_appr_dt 
    6 FROM TIMETABLE A 
    7 ) 
    8 WHERE approval_date = max_appr_dt 
    9/

Explained. 

SQL> 
SQL> select * from table(dbms_xplan.display) 
    2/

PLAN_TABLE_OUTPUT 
---------------------------------------------------------------------------------------------------- 
Plan hash value: 2691156688 

--------------------------------------------------------------------------------- 
| Id | Operation   | Name  | Rows | Bytes | Cost (%CPU)| Time  | 
--------------------------------------------------------------------------------- 
| 0 | SELECT STATEMENT |   |  4 | 356 |  3 (0)| 00:00:01 | 
|* 1 | VIEW    |   |  4 | 356 |  3 (0)| 00:00:01 | 
| 2 | WINDOW SORT  |   |  4 | 304 |  3 (0)| 00:00:01 | 
| 3 | TABLE ACCESS FULL| TIMETABLE |  4 | 304 |  3 (0)| 00:00:01 | 
--------------------------------------------------------------------------------- 


PLAN_TABLE_OUTPUT 
---------------------------------------------------------------------------------------------------- 
Predicate Information (identified by operation id): 
--------------------------------------------------- 

    1 - filter("APPROVAL_DATE"="MAX_APPR_DT") 

Note 
----- 
    - dynamic statistics used: dynamic sampling (level=2) 

19 rows selected. 

SQL> 
SQL> EXPLAIN PLAN FOR 
    2 SELECT * 
    3 FROM 
    4 (SELECT a.*, 
    5  row_number() OVER(PARTITION BY NAME, BEGIN_DATE ORDER BY APPROVAL_DATE DESC) AS "RNK" 
    6 FROM TIMETABLE A 
    7 ) 
    8 WHERE rnk =1 
    9/

Explained. 

SQL> 
SQL> select * from table(dbms_xplan.display) 
    2/

PLAN_TABLE_OUTPUT 
---------------------------------------------------------------------------------------------------- 
Plan hash value: 3768566268 

-------------------------------------------------------------------------------------- 
| Id | Operation    | Name  | Rows | Bytes | Cost (%CPU)| Time  | 
-------------------------------------------------------------------------------------- 
| 0 | SELECT STATEMENT   |   |  4 | 356 |  3 (0)| 00:00:01 | 
|* 1 | VIEW     |   |  4 | 356 |  3 (0)| 00:00:01 | 
|* 2 | WINDOW SORT PUSHED RANK|   |  4 | 304 |  3 (0)| 00:00:01 | 
| 3 | TABLE ACCESS FULL  | TIMETABLE |  4 | 304 |  3 (0)| 00:00:01 | 
-------------------------------------------------------------------------------------- 


PLAN_TABLE_OUTPUT 
---------------------------------------------------------------------------------------------------- 
Predicate Information (identified by operation id): 
--------------------------------------------------- 

    1 - filter("RNK"=1) 
    2 - filter(ROW_NUMBER() OVER (PARTITION BY "NAME","BEGIN_DATE" ORDER BY 
       INTERNAL_FUNCTION("APPROVAL_DATE") DESC)<=1) 

Note 
----- 
    - dynamic statistics used: dynamic sampling (level=2) 

21 rows selected. 
+0

如果你不介意,你可以添加完整的查询,并解释这将如何扫描表只有一次?谢谢。 – 2014-10-05 16:31:28

+0

当然。但为此我需要创建两个测试用例并显示执行计划。我希望OP是否提供了创建和插入声明。如果时间允许,我会尽力更新有关perfoemance的更多信息。 – 2014-10-05 16:36:43

+0

有问题的创建和插入语句,如果这就是你想要的。谢谢。 – 2014-10-05 16:38:54

0

我知道您使用的是Oracle DB。但是,我使用SQL Server测试了这一点。 SQL应该适用于所有数据库。尝试我的查询,但。我不确定这是否是最有效的方法。让我知道这是否有帮助。

select t.ID, t.name, t.targetvalue, t.begin_date, t.approval_date 
from 
(
select name, begin_date, max(approval_date) as approval_date 
from timetable 
group by name, begin_date 
) as mx 
inner join timetable as t 
on mx.name = t.name and 
mx.begin_date = t.begin_date and 
mx.approval_date = t.approval_date 

额外的查询 - 如果你想在SQL服务器内的问题,以创建表 -

CREATE TABLE TimeTable (
    ID int NOT NULL, 
    NAME VARCHAR(20) NOT NULL,  
    TARGETVALUE INT,     
    NOTE VARCHAR(20) NULL,   
    BEGIN_DATE datetime NOT NULL, 
    APPROVAL_DATE datetime NOT NULL 

); 

insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values 
(1, 'Alpha', 5, 'Duplicate First', '08-03-14 09:43:00', 
            '09-03-14 09:43:00'); 

insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values 
(2, 'Alpha', 2, 'Duplicate Middle', '08-03-14 09:43:00', 
            '09-03-14 09:43:00'); 


insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values 
(3, 'Alpha', 3, 'Final Target', '08-03-14 09:43:00', 
           '09-03-14 10:00:00'); 

-- Same time as alpha, but not related: 
insert into TimeTable (ID, NAME, TARGETVALUE, NOTE, BEGIN_DATE, APPROVAL_DATE) values 
(4, 'Beta', 4, 'Only Target', '08-03-14 09:43:30', 
           '09-03-14 11:00:30'); 
+0

这是一个矫枉过正。它将执行两次表格扫描,使用'ANALYTICS'您可以避免这种情况。看到我的答案。 – 2014-10-05 08:53:08

+0

@LalitKumarB - 谢谢。我在SQL服务器中看到过这些功能,但在Oracle中看不到。很高兴知道。我会让我的答案留下来展示如何不写这样的查询。 – 2014-10-05 16:25:54

+0

我会尝试在我的答案中包含性能测试。但这与OP目前的要求并不完全相关。这将是对答案的补充。我很感谢你对SQL Server的回答。好。 – 2014-10-05 16:39:32