由空格，引号或括号定义中的gawk

领域我有以下格式的文本文件：由空格，引号或括号定义中的gawk

RANDOM-WORD1 ==> "string with whitespaces" (string with whitespaces) 
RANDOM-WORD2 ==> "another string" (and another) 
RANDOM-WORD3 ==> "yet another string" (and another)

我想定义gawk定界符：

空格
报价
括号

例如，第1行：

$1: RANDOM-WORD1 
$2: ==> 
$3: "string with whitespaces" 
$4: (string with whitespaces)

我已阅读gawk的FPATmanual和我写了这个：

FPAT = "([^[:blank:]]*)|(\"[^\"]+\")|(\([^)]+\))"

但是，它不会为括号工作，因为我得到：

$1: RANDOM-WORD1 
$2: ==> 
$3: "string with whitespaces" 
$4: (string

我试过在第三个子句中转义括号，但它也不起作用。我想忽略任何不在内的字符(...)。我知道一个事实，不会有任何嵌套的括号。

注意：我怎样才能忽略引号/圆括号作为字段数据？例如：

$1: RANDOM-WORD1 
$2: ==> 
$3: string with whitespaces 
$4: string with whitespaces

来源

2016-04-23 Adama

至于括号，你需要逃避他们两次：

FPAT = "([^[:blank:]]*)|(\"[^\"]+\")|(\\([^\\)]+\\))"

为了摆脱括号和报价，请使用substr：

$3 = substr($3, 2, length($3) - 2); 
$4 = substr($4, 2, length($4) - 2);

来源

2016-04-23 20:20:22 Guido

谢谢，这很有效，我也发现了它背后的原因：https://stackoverflow.com/questions/11383643/groovy-why-do-i-need-to-double-escape-square-brackets;它是一个类似的情况？ – Adama

@Adama见http://stackoverflow.com/a/36806066/1745001为什么你需要将它们转义两次。btw你可以用'\\ S'替换'[^ [：blank：]]'，因为你仍然使用gawk。 –

@Adama在我的理解中，'awk'在调用正则表达式处理引擎之前解释字符串。现在根据[规范]（http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html#tag_04_06_13_04），'\（'不是规范表中列出的有效转义序列，这也是原因为什么gawk显示错误“警告：转义序列”\）'视为普通'）'“。 '\（'从来没有让它成为正则表达式引擎。为了实现这一点，你需要将反斜杠转义为\\，以便它能够传递字符串处理，而正则表达式引擎可以看到'\（'。 – Guido

这FPAT = "([^ ]+)|([(][^)]+[)])|(\"[^\"]+\")"为我工作。它使用[ ],(和)里面不需要引用的技巧。

关于你如何去除引号或括号的第二个问题，我不是增加一个这样的动作没有更好的主意：

{ for(i=1; i<= NF; i++) { 
    b = substr($i, 1, 1); 
    e = substr($i, length($i), 1); 
    if((b == "\"" || b == "(") && (b == e)) { 
     $i = substr($i,2 , length($i) - 2) 
    } 
    } 
}

来源

2016-04-23 20:20:19

谢谢，您的反馈。 :)“双重转义与其他两个子句更加一致，所以我会继续这样做，但我一定会保留这个技巧。 – Adama

我不会为此使用FPAT，因为您的字段有一个命令，而不仅仅是一个模式。我会用第三个参数匹配（），因为它更简单，更可靠：

match($0,/(\S+)\s(\S+)\s"([^"]+)"\s\(([^)]+).*/,a)

例如为：

$ awk 'match($0,/(\S+)\s(\S+)\s"([^"]+)"\s\(([^)]+).*/,a) { print; for (i=1; i in a; i++) printf "a[%d]: %s\n", i, a[i] }' file 
RANDOM-WORD1 ==> "string with whitespaces" (string with whitespaces) 
a[1]: RANDOM-WORD1 
a[2]: ==> 
a[3]: string with whitespaces 
a[4]: string with whitespaces 
RANDOM-WORD2 ==> "another string" (and another) 
a[1]: RANDOM-WORD2 
a[2]: ==> 
a[3]: another string 
a[4]: and another 
RANDOM-WORD3 ==> "yet another string" (and another) 
a[1]: RANDOM-WORD3 
a[2]: ==> 
a[3]: yet another string 
a[4]: and another

来源

2016-04-23 21:37:27

由空格，引号或括号定义中的gawk

回答

相关问题