2017-07-08 34 views
-3

表'mytable'中有一列名为'Description'。匹配同一列中所有行的字

+----+-------------------------------+ 
| ID | Description     | 
+----+-------------------------------+ 
| 1 | My NAME is Sajid KHAN   | 
| 2 | My Name is Ahmed Khan   | 
| 3 | MY friend name is Salman Khan | 
+----+-------------------------------+ 

我需要写一个Oracle SQL查询/程序/功能列出列的不同的话。

输出应为:

+------------------+-------+ 
| Word    | Count | 
+------------------+-------+ 
| MY    |  3 | 
| NAME    |  3 | 
| IS    |  3 | 
| SAJID   |  1 | 
| KHAN    |  3 | 
| AHMED   |  1 | 
| FRIEND   |  1 | 
| SALMAN   |  1 | 
+------------------+-------+ 

字匹配应该是不区分大小写的。

我正在使用Oracle 12.1。

+0

到目前为止您尝试了什么? –

回答

1

让我们假设我们会以某种方式设法将所有的描述分开。 因此,而不是单行ID = 1和说明=“我的名字是萨吉德·坎”,我们不得不这样

ID | Description 
--- | ------------ 
1 | My 
1 | NAME 
1 | is 
1 | Sajid 
1 | KHAN 
以这种形式

5行这将会是微不足道的,像

select Description, count(*) from data_in_new_form group by Description 

所以,我们使用递归查询来做到这一点。

create table mytable 
as 
select 1 as ID, 'My NAME is Sajid KHAN' as Description from dual 
union all 
select 2, 'My Name is Ahmed Khan' from dual 
union all 
select 3, 'MY friend name is Salman Khan' from dual 
union all 
select 4, 'test, punctuation! it is' from dual 
; 


with 
rec (id, str, depth, element_value) as 
(
    -- Anchor member. 
    select id, upper(Description) as str, 1 as depth, REGEXP_SUBSTR(upper(Description), '(.*?)(|$)', 1, 1, NULL, 1) AS element_value 
    from mytable 
    UNION ALL 
    -- Recursive member. 
    select id, str, depth + 1, REGEXP_SUBSTR(str ,'(.*?)(|$)', 1, depth+1, NULL, 1) AS element_value 
    from rec 
    where depth < regexp_count(str, ' ')+1 
) 
, data as (
select * from rec 
--order by id, depth 
) 
select element_value, count(*) from data 
group by element_value 
order by element_value 
; 

请注意,该版本不会对标点符号做任何事情,假设词语用空格分隔。采用分层查询

with rec as 
(
    SELECT id, LEVEL AS depth, 
    REGEXP_SUBSTR(upper(description) ,'(.*?)(|$)', 1, LEVEL, NULL, 1) AS element_value 
    FROM mytable 
    CONNECT BY LEVEL <= regexp_count(description, ' ')+1 
    and prior id = id 
    and prior SYS_GUID() is not null 
) 
, data as (
select * from rec 
--order by id, depth 
) 
select element_value, count(*) from data 
group by element_value 
order by 2 desc 
; 
+0

非常感谢您的快速响应。我尝试这个查询,它的工作正常。 。 –

+0

我有Oracle 10g,11g和12c。 此查询只适用于12c不在10g和11g是他们的任何等效查询10g,11g ?????????? –

+0

这很奇怪:只测试了一个11g DB上的递归查询,它工作正常。尝试使用分层版本。 –

0

这个查询将工作

UPDATE另一种方式。单词的排序可能不同。不过,频繁出现的词语就像您列出的那样开始。

SELECT word, 
     COUNT(*) 
     FROM 
     (SELECT TRIM (REGEXP_SUBSTR (Description, '[^ ]+', 1, ROWNUM)) AS Word 
     FROM 
     (SELECT LISTAGG(UPPER(Description),' ') within GROUP(
      ORDER BY ROWNUM) AS Description 
     FROM mytable 
     ) 
     CONNECT BY LEVEL <= REGEXP_COUNT (Description, '[^ ]+') 
    ) 
    GROUP BY WORD 
    ORDER BY 2 DESC; 
+0

该'LISTAGG'可以引发'ORA- 01489:字符串连接的结果太长'异常。 –

+0

谢谢你,这也适用于我,因为我的列是varchar(100),所以我的字符串不会太长。 –

+0

但是有多少这样的列要连接? 40行这样的行可能已经太长了。 –

相关问题