正则表达式：忽略开头为这些字符的行

如何创建正则表达式来忽略以“空字符”，＃或字母开头的行。下面是我的数据的样本，我需要匹配的线只与数字开始（正或负）：正则表达式：忽略开头为这些字符的行

0.000000 1.2712052472 0.8899021956 22.2458 265.2511402076 322.1539247218 -13.6281 -130.986 0.155342 0.889755 phaet_000227 
0.000000 1.2712052462 0.8899021922 22.2458 265.2511430964 322.1539209801 -13.6281 -130.986 0.155342 0.889755 phaet_000090 
0.000000 1.2712052476 0.8899022047 22.2458 265.2511396341 322.1539260295 -13.6281 -130.986 0.155342 0.889755 phaet_000111 
0.000000 1.2712052465 0.8899022229 22.2458 265.2511497521 322.1539197205 -13.6281 -130.986 0.155342 0.889755 phaet_000059 
Nplanets 9 Nparticles 500: alive 509/509 ejected 0 rmin 0 rmax 0 
Full close app checks 0/0 (0.000000%) BS fails 0 
Close apps 1 bounces 0 accretions 0 Max n/step 0 
Simulation time 0 going to -100000. 
Real time 1 s Force 0 s (0.00 %) Coll 0 s (0.00 %) 
       E&L 0 s (0.00 %) Kep 0 s (0.00 %) 
CPU time 0.037627 s Force 0 s (0.00 %) Coll 0 s (0.00 %) 
       E&L  0 s (0.00 %) Kep 0 s (0.00 %) 
# Nplanets 9 Nparticles 500: alive 509/509 ejected 0 rmin 0 rmax 0 
# Full close app checks 0/0 (0.000000%) BS fails 0 
# Close apps 1 bounces 0 accretions 0 Max n/step 0 
# Simulation time 0 going to -100000. 
# Real time 1 s Force 0 s (0.00 %) Coll 0 s (0.00 %) 
#    E&L 0 s (0.00 %) Kep 0 s (0.00 %) 
# CPU time 0.037627 s Force 0 s (0.00 %) Coll 0 s (0.00 %) 
#    E&L  0 s (0.00 %) Kep 0 s (0.00 %) 
Output step 1 at t=-10 going to -100000 
-10.000000 1.2713031501 0.8900442847 22.1802 265.4033924020 322.0041354013 -5.32091 -102.357 0.155286 0.88482 phaet_000065 
-10.000000 1.2713031508 0.8900443093 22.1802 265.4033954804 322.0041360861 -5.32091 -102.357 0.155286 0.88482 phaet_000299 
-10.000000 1.2713031483 0.8900442977 22.1802 265.4033839221 322.0041469420 -5.32092 -102.357 0.155286 0.88482 phaet_000102 
-10.000000 1.2713031486 0.8900442931 22.1802 265.4033724632 322.0041581369 -5.32092 -102.357 0.155286 0.884821 phaet_000371 
-10.000000 1.2713031463 0.8900442910 22.1802 265.4033772870 322.0041532421 -5.32093 -102.357 0.155286 0.884821 phaet_000019

我想终于有：

0.000000 1.2712052472 0.8899021956 22.2458 265.2511402076 322.1539247218 -13.6281 -130.986 0.155342 0.889755 phaet_000227 
0.000000 1.2712052462 0.8899021922 22.2458 265.2511430964 322.1539209801 -13.6281 -130.986 0.155342 0.889755 phaet_000090 
0.000000 1.2712052476 0.8899022047 22.2458 265.2511396341 322.1539260295 -13.6281 -130.986 0.155342 0.889755 phaet_000111 
0.000000 1.2712052465 0.8899022229 22.2458 265.2511497521 322.1539197205 -13.6281 -130.986 0.155342 0.889755 phaet_000059 
-10.000000 1.2713031501 0.8900442847 22.1802 265.4033924020 322.0041354013 -5.32091 -102.357 0.155286 0.88482 phaet_000065 
-10.000000 1.2713031508 0.8900443093 22.1802 265.4033954804 322.0041360861 -5.32091 -102.357 0.155286 0.88482 phaet_000299 
-10.000000 1.2713031483 0.8900442977 22.1802 265.4033839221 322.0041469420 -5.32092 -102.357 0.155286 0.88482 phaet_000102 
-10.000000 1.2713031486 0.8900442931 22.1802 265.4033724632 322.0041581369 -5.32092 -102.357 0.155286 0.884821 phaet_000371 
-10.000000 1.2713031463 0.8900442910 22.1802 265.4033772870 322.0041532421 -5.32093 -102.357 0.155286 0.884821 phaet_000019

于是，我试着“grep的”如下：

grep -v '^[a-z,A-Z,\s,\#]' file1.dat > file2.dat

它摆脱开头字母和“＃”但行开始用白色空间的线条仍然存在，即我不能删除：

 E&L 0 s (0.00 %) Kep 0 s (0.00 %) 
     E&L  0 s (0.00 %) Kep 0 s (0.00 %)

请注意，在“E & L”之前有空白。

任何想法如何摆脱这些呢？

来源

2016-11-15 user3578925

在grep的，[\s,\#]反斜杠，逗号，或井号相匹配。（反斜杠在括号表达式中没有特殊含义，逗号也不是特殊的。）匹配空格的最简单方法是与[:space:]字符类相匹配。所以，你的正则表达式是：

^[a-zA-Z#[:space:]]

你也可以做该做与数字开始行肯定搜索：

^-\?[[:digit:]]\+\.[[:digit:]]\+

来源

2016-11-15 14:03:58

我认为这个解决方案是最好的。谢谢。 – user3578925

由于领先的空格，这两行不会消除。你可以先消除它们。

sed "s/^[ \t]*//" file1.dat > file3.dat

然后使用'grep'命令过滤文件。

grep -v '^[a-z,A-Z,\s,\#]' file3.dat > file2.dat

来源

2016-11-15 12:26:07 amin

感谢这么多。 – user3578925

正则表达式：忽略开头为这些字符的行

回答

相关问题