在Bash中的多重多行正则表达式匹配

我想在bash脚本中做一些相当简单的字符串解析。基本上，我有一个由多个多行字段组成的文件。每个字段都被已知的页眉和页脚包围。在Bash中的多重多行正则表达式匹配

我想单独提取每一个领域到一个数组或类似的，像这样的

>FILE=`cat file` 
>REGEX="@#@#@#[\s\S][email protected]#@#@" 
> 
>if [[$FILE =~ $REGEX ]] then 
> echo $BASH_REMATCH 
>fi

FILE：

@#@#@################################# 
this is field one 
@#@#@# 
@#@#@################################# 
this is field two 
they can be any number of lines 
@#@#@#

现在，我敢肯定，问题是bash的不匹配换行符“。”

我可以将它与“pcregrep -M”匹配，但是当然整个文件将会匹配。我能从pcregrep一次获得一场比赛吗？

我不反对使用一些内联perl或类似的东西。

在此先感谢

来源

2010-01-22 prestomation

，如果你有呆子

awk 'BEGIN{ RS="@#*#" } 
NF{ 
    gsub("\n"," ") #remove this is you want to retain new lines 
    print "-->"$0 
    # put to array 
    arr[++d]=$0 
} ' file

输出

$ ./shell.sh 
--> this is field one 
--> this is field two they can be any number of lines

来源

2010-01-22 16:09:27 ghostdog74

修改这一点做我想做的事情。 Awk是我从未学过的东西。谢谢！ – prestomation 2010-01-22 19:42:18

我将围绕打造awk东西。这是概念的第一个证据：

awk ' 
    BEGIN{ f=0; fi="" } 
    /^@#@#@#################################$/{ f=1 } 
    /^@#@#@#$/{ f=0; print"Field:"fi; fi="" } 
    { if(f==2)fi=fi"-"$0; if(f==1)f++ } 
' file

来源

2010-01-22 16:06:02 mouviciel

begin="@#@#@#################################" 
end="@#@#@#" 
i=0 
flag=0 

while read -r line 
do 
    case $line in 
     $begin) 
      flag=1;; 
     $end) 
      ((i++)) 
      flag=0;; 
     *) 
      if [[ $flag == 1 ]] 
      then 
       array[i]+="$line"$'\n' # retain the newline 
      fi;; 
    esac 
done < datafile

如果你想保持在该数组元素的标记线，将赋值语句（其标志测试）到while循环的前顶部case。

来源

2010-01-22 16:06:11

的TXR语言进行整个文档的多线路匹配，结合变量，和（同-B“转储绑定”选项）发射妥善转义的shell变量赋值可以是eval -ed。数组支持。

@字符是特殊的，所以它必须加倍以匹配字面意思。

$ cat fields.txr 
@(collect) 
@@#@@#@@################################# 
@ (collect) 
@field 
@ (until) 
@@#@@#@@# 
@ (end) 
@ (cat field)@# <- catenate the fields together with a space separator by default 
@(end) 

$ txr -B fields.txr data 
field[0]="this is field one" 
field[1]="this is field two they can be any number of lines" 

$ eval $(txr -B fields.txr data) 
$ echo ${field[0]} 
this is field one 
$ echo ${field[1]} 
this is field two they can be any number of lines

@field的语法整行匹配。这些被收集到列表中，因为它在@(collect)之内，并且列表被收集到列表列表中，因为它被嵌套在另一个@(collect)内。但是，内部的@(cat field)将内部列表减少为单个字符串，因此我们最终得到一个字符串列表。

这是“经典TXR”：怎么了最初的设计和使用，通过这个想法引发：

我们为什么不使这里的文档向后工作，也从文字的里姆斯解析成变数？

这隐含的默认匹配的变量，在shell语法默认的排放，仍然是支持的行为，即使语言已经成长更强大，所以不太需要用shell脚本集成。

来源

2014-01-07 01:01:31 Kaz

在Bash中的多重多行正则表达式匹配

回答

相关问题