写部分制表符分隔的数据MySQL数据库

我有一个MySQL数据库内数据与7列（chr，pos，num，iA，iB，iC，iD）和包含各含有一个数据集40000000行的文件。每行具有4个制表符分隔的列，而第一三列总是包含数据，以及第四列可以包含多达三个不同key=value对由分号写部分制表符分隔的数据MySQL数据库

chr pos num info 
1  10203 3  iA=0.34;iB=nerv;iC=45;iD=dskf12586 
1  10203 4  iA=0.44;iC=45;iD=dsf12586;iB=nerv 
1  10203 5  
1  10213 1  iB=nerv;iC=49;iA=0.14;iD=dskf12586 
1  10213 2  iA=0.34;iB=nerv;iD=cap1486 
1  10225 1  iD=dscf12586

在列信息的键值对具有分离没有特定的顺序。我也不确定一个键是否会出现两次（我不希望）。

我想将数据写入数据库。前三列没有问题，但是从info-columns中提取值使我困惑，因为key = value对是无序的，并不是每个键都必须在行中。对于一个类似的数据集（有序的信息列），我用一个java-Programm与正则表达式相关联，这使得我可以（1）检查和（2）提取数据，但现在我陷入困境。

我该如何解决这个任务，最好用bash脚本或直接在MySQL中解决？

来源

2013-05-14 R_User

什么？ – HamZa 2013-05-14 08:03:11

对不起，这可以用几乎任何语言来完成：p我要做的是以下内容：遍历每一行，由\ t +'分隔（tab（s））。 *用';'分割最后一个制表符，再用'='分割。现在你有了* info *的值，你只需创建它后面的逻辑并创建一个查询并执行它。 – HamZa 2013-05-14 08:08:30

@R_User，你是否得到了答案？ – svante 2013-09-10 13:11:34

你没有提到你想要如何写入数据。但下面的示例awk显示了如何获取每行中的每个单独的ID和密钥。而不是printf的，你可以用你自己的逻辑来写入数据

[[bash_prompt$]]$ cat test.sh; echo "###########"; awk -f test.sh log 
{ 
    if(length($4)) { 
    split($4,array,";"); 
    print "In " $1, $2, $3; 
    for(element in array) { 
     key=substr(array[element],0,index(array[element],"=")); 
     value=substr(array[element],index(array[element],"=")+1); 
     printf("found %s key and %s value for %d line from %s\n",key,value,NR,array[element]); 
    } 
    } 
} 
########### 
In 1 10203 3 
found iD= key and dskf12586 value for 1 line from iD=dskf12586 
found iA= key and 0.34 value for 1 line from iA=0.34 
found iB= key and nerv value for 1 line from iB=nerv 
found iC= key and 45 value for 1 line from iC=45 
In 1 10203 4 
found iB= key and nerv value for 2 line from iB=nerv 
found iA= key and 0.44 value for 2 line from iA=0.44 
found iC= key and 45 value for 2 line from iC=45 
found iD= key and dsf12586 value for 2 line from iD=dsf12586 
In 1 10213 1 
found iD= key and dskf12586 value for 4 line from iD=dskf12586 
found iB= key and nerv value for 4 line from iB=nerv 
found iC= key and 49 value for 4 line from iC=49 
found iA= key and 0.14 value for 4 line from iA=0.14 
In 1 10213 2 
found iA= key and 0.34 value for 5 line from iA=0.34 
found iB= key and nerv value for 5 line from iB=nerv 
found iD= key and cap1486 value for 5 line from iD=cap1486 
In 1 10225 1 
found iD= key and dscf12586 value for 6 line from iD=dscf12586

来源

2013-05-14 08:37:32 abasu

从@abasu awk中的解决方案与刀片也解决了无序键值对。

parse.awk：

NR>1 { 
    col["iA"]=col["iB"]=col["iC"]=col["iD"]="null"; 

    if(length($4)) { 
    split($4,array,";"); 
    for(element in array) { 
     split(array[element],keyval,"="); 
     col[keyval[1]] = "'" keyval[2] "'"; 
    } 
    } 
    print "INSERT INTO tbl VALUES (" $1 "," $2 "," $3 "," col["iA"] "," col["iB"] "," col["iC"] "," col["iD"] ");"; 
}

测试/运行：

有关使用PHP

$ awk -f parse.awk file 
INSERT INTO tbl VALUES (1,10203,3,'0.34','nerv','45','dskf12586'); 
INSERT INTO tbl VALUES (1,10203,4,'0.44','nerv','45','dsf12586'); 
INSERT INTO tbl VALUES (1,10203,5,null,null,null,null); 
INSERT INTO tbl VALUES (1,10213,1,'0.14','nerv','49','dskf12586'); 
INSERT INTO tbl VALUES (1,10213,2,'0.34','nerv',null,'cap1486'); 
INSERT INTO tbl VALUES (1,10225,1,null,null,null,'dscf12586');

来源

2013-05-14 10:21:00 svante

写部分制表符分隔的数据MySQL数据库

回答

相关问题