2016-03-02 128 views
0

说我有一个表有两列和下面的值第二列的所有可能值:我想删除与C1重复值的所有行删除重复的行,但保留

C1 | C2 
------- 
a1 b1 
a1 b2 
a1 b3 
a2 b1 
a2 b2 
a2 b3 
a3 b1 
a3 b2 
a3 b3 

,但以剩余行的方式保留C2的所有不同值。因此,在这种情况下,结果必须是:

C1 | C2 
------- 
a1 b1 
a2 b2 
a3 b3 

而且不一样的东西:

C1 | C2 
------- 
a1 b1 
a2 b1 
a3 b1 
+0

它会一直如此吗?每个C1都具有C2中的所有值和相反? – Yossi

+0

如果你要保留a1-b1的组合,或者它可以是a2-b1,这有什么关系? – Veljko89

+0

@Yossi,不会有另一排在那里a4 |例如,没有其他值的c4。 – Cantillon

回答

0

这是方式,我会去在这种情况下,使用T-SQL是

if object_ID('tempdb..#Temp') is not null drop table #Temp 

create table #Temp (c1 nvarchar(5), c2 nvarchar(5)) 

insert into #Temp (c1, c2) 
values 
('a1','b1'), 
('a1','b2'), 
('a1','b3'), 
('a2','b1'), 
('a2','b2'), 
('a2','b3'), 
('a3','b1'), 
('a3','b2'), 
('a3','b3') 

if object_ID('tempdb..#Temp2') is not null drop table #Temp2 
select *, ROW_NUMBER() over (partition by c2 order by c2) [Num] into #Temp2 from #Temp t1 

delete from #Temp2 where Num != 1 

select * from #Temp2 

正如你不在乎组合的......你会得到不同的值

0

我不认为这是一个完全可靠的方法你想在SQL中使用什么。我怀疑实际问题可能等同于NP或NP完成的图形问题。

一个近似是选择一个随机行的每个值:

select t.* 
from (select t.*, 
      row_number() over (partition by c1 order by dbms_random.random) as seqnum 
     from t 
    ) t 
where seqnum = 1; 

这当然,有没有保证。但它至少会打开获取所需行的可能性。

如果您有所有组合(例如在您的示例中),则第二种方法有效。如果是这样,您可以从值中构建行:

select t1.c1, t2.c2 
from (select least(count(distinct c1), count(distinct c2)) as cd from t) cross join 
    (select distinct c1, rownum as rn from t) t1 join 
    (select distinct c2, rownum as rn from t) t2 
    on mod(t1.rn, cd) = mod(t1.rn, cd); 

但是,这假设结果对实际上是在一行中。

0

这个答案很荒谬,但我相信它的确有窍门!这可能是大型数据集,而慢...

with selector as 
(select rownum-1 as setnum 
    from dual 
    connect by level <= power(2,(select count(*) from my_table)) 
), /* This generates the integers 0..(2^n)-1 where n is number of rows in table */ 
data as 
(select c1, c2, row_number() over (order by c1, c2) as rn 
    from my_table 
), /* This assigns each row in the table a row number 1..n */ 
cj as 
(select setnum, c1, c2 
    from selector cross join data 
    where bitand(setnum, power(2,rn-1)) = power(2,rn-1) 
), /* This generates all the possible sets of 1-n rows. 
     The rows in the set are determined by the bits of the setnum value 
     e.g. setnum 5 (101 in binary) contains rows 1 and 4 */ 
set_sizes as 
(select setnum, count(*) cnt from cj 
    group by setnum 
    having count(distinct c1) = (select count(distinct c1) from my_table) 
    and count(distinct c2) = (select count(distinct c2) from my_table) 
), /* This determines the number of rows in each set AND excludes sets that 
     don't include all the c1 and c2 values */ 
one_set as 
(select min(setnum) minsetnum from set_sizes 
    where cnt = (select min(cnt) from set_sizes) 
) /* This selects one of the sets that has the smallest number of rows */ 
select c1, c2 from cj 
where setnum = (select minsetnum from one_set) 
order by 1 

它这样做是:

  1. 产生从表
  2. 过滤掉那些不包含所有C1行的所有可能集合价值观和所有C2值
  3. 发现这些
  4. 最小的套任意选择这些最小的集合中的一个,并返回其数据

如果任何人都可以为我的with-clause子查询建议更好(更有意义)的名字,请做!