不完整的CSV文件（缺少列）

-1

我有一个CSV文件，在这个CSV文件中有一些列的缺失条目。如果没有设置列，我想添加一个空值。不完整的CSV文件（缺少列）

这里是CSV文件的结构：

ID; LON; LAT;图像;历史;地址;文本;类型;名称;网络; DATE_OF_BIRTH; date_of_death; START_DATE

一个完整的数据行看起来是这样的：

n3329319394; 4.369872; 50.866430;历史纪念=;像=文件：Schaerbeek_40_rue_Vondel_Les_pavés_de_la_mémoire.jpg;纪念：地址= 40，茹ë冯德尔 - Vondelstraat，斯哈尔贝克;纪念：文本= ICI habitait 伊丽莎白Orcher-Karolinski NEE 1912抗线arrêtée1942年8月15日拘禁Malines被驱逐1942年8月18日奥斯威辛assassinée 1942年8月20日;纪念：类型=绊脚石;名称=伊丽莎白Orcher- Karolinski;网络=绊脚石布鲁塞尔;负责人：DATE_OF_BIRTH = 1912-00-00;负责人：date_of_death = 1942年8月20日

但不时数据行看起来是这样的：

n4208925477; 5.041860; 52.141352;历史纪念碑=纪念馆;纪念馆：addr =兰格拉赫特 27; memorial = type = stolperstein; name = Lucas & Clara IJzerman

任何想法如何轻松地转换此数据？一个很好的提示可能是限定词： “形象= ...” 等

感谢，比约恩·

来源

2017-08-03 BGam

[投诉]

你没有提供你自己的努力。
您对问题的描述不准确。

[投诉关闭]

但是你提出我的好奇心。因此，我试图解决它。（特别是提的示例数据Stolperstein –很聪明现在，我感觉像一个“使命是帮助好人” ......）

我简化你的问题有点：

我假设领域id，lon和lat是强制性的。
我认为可选的命名字段为name,historic和image。

我的测试数据test-complete-lines.txt：

n3329319394;4.369872;50.866430;name=Klaus Mustermann;historic=memorial;image=j.doe-de.png 
n3329319395;4.369872;50.866430;name=Gabi Mustermann 
n4208925477;5.041860;52.141352;historic=memorial 
n4208925477;5.041860;52.141352;image=the-image.png 
n3329319395;4.369872;50.866430;name=Gabi Mustermann;historic=memorial 
n3329319395;4.369872;50.866430;name=Gabi Mustermann;image=j.doe.female-de.png

我的脚本test-complete-lines.awk：

BEGIN { FS=";" } 
# get mandatory fields id, lon, lat 
{ id = $1 ; lon = $2 ; lat = $3 } 
# set optional fields empty 
{ name=";name=" ; historic=";historic=" ; image=";image=" } 
# replace found fields with values 
/;name=/ { name = gensub(/^.*(;name=[^;]*).*$/, "\\1", "g", $0) } 
/;historic=/ { historic = gensub(/^.*(;historic=[^;]*).*$/, "\\1", "g", $0) } 
/;image=/ { image = gensub(/^.*(;image=[^;]*).*$/, "\\1", "g", $0) } 
# print processed line 
{ print id";"lon";"lat""name""historic""image }

测试与GAWK（bash中，Cygwin的时，Windows 10（64位））：

$ awk --version 
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p10, GNU MP 6.1.2) 
Copyright (C) 1989, 1991-2016 Free Software Foundation. 

$ awk -f test-complete-lines.awk <test-complete-lines.txt 
n3329319394;4.369872;50.866430;name=Klaus Mustermann;historic=memorial;image=j.doe-de.png 
n3329319395;4.369872;50.866430;name=Gabi Mustermann;historic=;image= 
n4208925477;5.041860;52.141352;name=;historic=memorial;image= 
n4208925477;5.041860;52.141352;name=;historic=;image=the-image.png 
n3329319395;4.369872;50.866430;name=Gabi Mustermann;historic=memorial;image= 
n3329319395;4.369872;50.866430;name=Gabi Mustermann;historic=;image=j.doe.female-de.png 

$

备注：

替换找到的字段假定没有;将出现在内容中。我建议你做一个计数器样本（其中;出现在内容中）。这可能会激活某种引用或转义。因此，可能需要额外处理这种情况。
我只提到了一些命名的字段。您必须在计划后添加其余部分。
Btw。我的示例文本中意外地出现了一行空行。这产生：
;;;name=;historic=;image=
如果需要处理空行，另一个规则可能会（后BEGIN { }）插入：
/^[ \t]*$/ { skip }
在我的第一个版本，我在样本数据–一个被遗忘的;一个错字。因此，image=成为name=的内容，但也被确认为个别领域。假设命名字段可能不会作为第一个字段出现，我将其包括前面的;修正为字段名称的模式。

来源

2017-08-05 08:19:18 Scheff

不完整的CSV文件（缺少列）

回答

相关问题