2016-09-07 64 views
0

在COBOL程序上工作时,一个地雷同事遇到了这个问题,最终在应用程序级别解决了这个问题。 我仍然很好奇,如果有可能在SQL的数据访问级别上解决它。 这与this other question有某种关系,但我只想使用ANSI SQL。在SQL中将CSV字段拆分成不同的行

我正在寻找一个单一的SQL选择查询,该查询作用于包含可变长度CSV行的VARCHAR字段。查询的目的是在自己的结果集行中分割每个CSV字段。

这里是架构和数据的例子(这里是fiddle):

CREATE TABLE table1 (`field` varchar(100)); 

INSERT INTO table1 (`field`) 
     VALUES 
      ('Hello,world,!') , 
      ('Haloa,!')   , 
      ('Have,a,nice,day,!'); 

这里是我想从查询到有输出:

Hello 
world 
! 
Haloa 
! 
Have 
a 
nice 
day 
! 

的CSV使用的分隔符是逗号,现在我不担心转义。

+0

取决于您的DBMS。有一些分割函数的实现(很像很多语言中的),你需要为每个表格记录(字段)调用','作为分隔符。如果您的数据库管理系统中不可用,您可以编写一个简单的函数来返回一个简单的数组/游标/结果集。 – FDavidov

+0

标记您正在使用的dbms。 – jarlh

+1

首先不将逗号分隔的值存储到SQL表中可防止许多问题。您似乎可以控制数据库 - 正确设计它,而不是浪费时间创建可完全避免的问题的解决方法。 – Tomalak

回答

2

据我所知,这是ANSI SQL:

with recursive word_list (field, word, rest, field_id, level) as (    
    select field, 
     substring(field from 1 for position(',' in field) - 1) as word, 
     substring(field from position(',' in field) + 1) as rest, 
     row_number() over() as field_id, 
     1 
    from table1 
    union all 
    select c.field, 
     case 
      when position(',' in p.rest) = 0 then p.rest 
      else substring(p.rest from 1 for position(',' in p.rest) - 1) 
     end as word, 
     case 
      when position(',' in p.rest) = 0 then null 
      else substring(p.rest from position(',' in p.rest) + 1) 
     end as rest, 
     p.field_id, 
     p.level + 1 
    from table1 as c 
    join word_list p on c.field = p.field and position(',' in p.rest) >= 0 
) 
select word 
from word_list 
order by field_id, level; 

这假定field中的值是唯一的。

这里是一个正在运行的例子:http://rextester.com/NARS7464

+0

这真是太神奇了,以下是可以在其中工作的数据库的概述:https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL#Common_table_expression –

0

在Oracle中你可以使用类似的东西(也许它不是最优雅,但它给你想要的结果) - 简单地用your_table_name更换tab

WITH 
tab2 AS (
SELECT t.field, 
     CASE WHEN INSTR(t.field, ',', 1, 1) > 0 AND regexp_count(t.field,',') >= 1 THEN INSTR(t.field, ',', 1, 1) ELSE NULL END AS pos1, 
     CASE WHEN INSTR(t.field, ',', 1, 2) > 0 AND regexp_count(t.field,',') >= 2 THEN INSTR(t.field, ',', 1, 2) ELSE NULL END AS pos2, 
     CASE WHEN INSTR(t.field, ',', 1, 3) > 0 AND regexp_count(t.field,',') >= 3 THEN INSTR(t.field, ',', 1, 3) ELSE NULL END AS pos3, 
     CASE WHEN INSTR(t.field, ',', 1, 4) > 0 AND regexp_count(t.field,',') >= 4 THEN INSTR(t.field, ',', 1, 4) ELSE NULL END AS pos4, 
     CASE WHEN INSTR(t.field, ',', 1, 5) > 0 AND regexp_count(t.field,',') >= 5 THEN INSTR(t.field, ',', 1, 5) ELSE NULL END AS pos5, 
     CASE WHEN INSTR(t.field, ',', 1, 6) > 0 AND regexp_count(t.field,',') >= 6 THEN INSTR(t.field, ',', 1, 6) ELSE NULL END AS pos6 
FROM tab t 
), 
tab3 AS (
SELECT SUBSTR(tt.field,1,tt.pos1-1) AS col1, 
     SUBSTR(tt.field,tt.pos1+1, CASE WHEN tt.pos2 IS NULL THEN LENGTH(tt.field) - tt.pos1 ELSE tt.pos2 - tt.pos1 - 1 END) AS col2, 
     SUBSTR(tt.field,tt.pos2+1, CASE WHEN tt.pos3 IS NULL THEN LENGTH(tt.field) - tt.pos2 ELSE tt.pos3 - tt.pos2 - 1 END) AS col3, 
     SUBSTR(tt.field,tt.pos3+1, CASE WHEN tt.pos4 IS NULL THEN LENGTH(tt.field) - tt.pos3 ELSE tt.pos4 - tt.pos3 - 1 END) AS col4, 
     SUBSTR(tt.field,tt.pos4+1, CASE WHEN tt.pos5 IS NULL THEN LENGTH(tt.field) - tt.pos4 ELSE tt.pos5 - tt.pos4 - 1 END) AS col5, 
     SUBSTR(tt.field,tt.pos5+1, CASE WHEN tt.pos6 IS NULL THEN LENGTH(tt.field) - tt.pos5 ELSE tt.pos6 - tt.pos5 - 1 END) AS col6 
     ,ROWNUM AS r 
FROM tab2 tt 
), 
tab4 AS (
SELECT ttt.col1 AS col FROM tab3 ttt WHERE r = 1 
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r = 1 
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r = 1 
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r = 1 
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r = 1 
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r = 1 
UNION ALL 
SELECT ttt.col1 FROM tab3 ttt WHERE r = 2 
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r = 2 
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r = 2 
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r = 2 
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r = 2 
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r = 2 
UNION ALL 
SELECT ttt.col1 FROM tab3 ttt WHERE r = 3 
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r = 3 
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r = 3 
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r = 3 
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r = 3 
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r = 3 
UNION ALL 
SELECT ttt.col1 FROM tab3 ttt WHERE r = 4 
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r = 4 
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r = 4 
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r = 4 
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r = 4 
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r = 4 
UNION ALL 
SELECT ttt.col1 FROM tab3 ttt WHERE r = 5 
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r = 5 
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r = 5 
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r = 5 
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r = 5 
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r = 5 
) 
SELECT col 
FROM tab4 
WHERE col IS NOT NULL 

它给我的结果:

1 Hello 
2 world 
3 ! 
4 Haloa 
5 ! 
6 Have 
7 a 
8 nice 
9 day 
10 ! 
0

FWIW,这里是另一个Oracle特定的方法。也许它至少会给出一个想法或帮助未来的搜索者。

SQL> with tbl(rownbr, col1) as (
      select 1, 'Hello,world,!'  from dual union 
      select 2, 'Haloa,!'   from dual union 
      select 3, 'Have,a,nice,day,!' from dual 
    ) 
    SELECT rownbr, column_value substring_nbr, 
     regexp_substr(col1, '(.*?)(,|$)', 1, column_value, null, 1) 
    FROM tbl, 
       TABLE(
        CAST(
        MULTISET(SELECT LEVEL 
           FROM dual 
           CONNECT BY LEVEL <= REGEXP_COUNT(col1, ',')+1 
          ) AS sys.OdciNumberList 
       ) 
       ) 
     order by rownbr, substring_nbr; 

    ROWNBR SUBSTRING_NBR REGEXP_SUBSTR(COL 
---------- ------------- ----------------- 
     1    1 Hello 
     1    2 world 
     1    3 ! 
     2    1 Haloa 
     2    2 ! 
     3    1 Have 
     3    2 a 
     3    3 nice 
     3    4 day 
     3    5 ! 

10 rows selected. 

SQL>