合并2正则表达式模式

我有一个包含这样的事情（这仅适用于节选）的文本文件：合并2正则表达式模式

Third Doctor 
Season 7 
051 Spearhead from Space 4 3—24 January 1970 
052 Doctor Who and the Silurians 7 31 January—14 March 1970 
053 The Ambassadors of Death 7 21 March—2 May 1970 
054 Inferno 7 9 May—20 June 1970 

Season 8 
055 Terror of the Autons 4 2—23 January 1971 
056 The Mind of Evil 6 30 January—6 March 1971 
057 The Claws of Axos 4 13 March—3 April 1971 
058 Colony in Space 6 10 April—15 May 1971 
059 The Dæmons 5 22 May—19 June 1971

注意，基本路线模式是^###\t.*\t?\t.*$（即几乎每行有3个标签\t）。

我想选段标题后删除一切，所以它看起来像这样：

Third Doctor 
Season 7 
051 Spearhead from Space 
052 Doctor Who and the Silurians 
053 The Ambassadors of Death 
054 Inferno 

Season 8 
055 Terror of the Autons 
056 The Mind of Evil 
057 The Claws of Axos 
058 Colony in Space 
059 The Dæmons

目前我测试中的gedit以下模式：

([^\t]*)$ # replaces not only everything after the last `\t', 
      # incl that `\t', but also lines that *does not* contain any `\t'

然后我试图以'选择'的行，这应该是(?=(?=^(?:(?!Season).)*$)(?=^(?:(?!Series).)*$)(?=^(?:(?!Doctor$).)*$)(?=^(?:(?!Title).)*$)(?=^(?:(?!Specials$).)*$)(?=^(?:(?!Mini).)*$)(?=^(?:(?!^\t).)*$)(?=^(?:(?!Anim).)*$)).*$ - 正常工作，但我不知道如何将它与([^\t]*)$相结合。

来源

2014-11-02 tukusejssirs

哪种语言？ – vks 2014-11-02 20:34:30

@vks：我会说bash，但真的不知道gedit 3.10.4使用什么样的正则表达式...但是bash（sed）正则表达式已经足够了:) – tukusejssirs 2014-11-02 21:47:30

^(\d{3}\s+.*?)(?=\s*\d).*$

尝试this.Replace通过$1。用标志m或MULTILINE取决于你regex.See演示的味道。

http://regex101.com/r/jI8lV7/8

来源

2014-11-02 20:41:43 vks

虽然这在演示中确实有效，但是在gedit和ubuntu gnome 14.10 gnome-terminal（默认设置）'sed's/^（\ d {3} \ s +。*？）（？= \ s * \ d）。* $ // g'file'。 ...至于'替换为'$ 1''和'使用标志'm'或'MULTILINE'，我需要更详细的说明，因为我不明白:) – tukusejssirs 2014-11-02 21:45:33

@tukusejssirs你可以使用python或perl.Python代码会是这样的。 'import re' 'p = re.compile（ur'^（\ d {3} \ s +。*？）（？= \ s * \ d）。* $'，re.MULTILINE | re。 IGNORECASE）' 'test_str = <你的测试字符串>' 'SUBST = U “$ 1”' '结果=应用re.sub（p，SUBST，test_str）' – vks 2014-11-03 04:55:59

既然是场由制表符分隔，您只需要使用cut获得两个第一场：

cut -f1,2 drwho.txt

的知识，使用awk一样：

awk -F"\t" '$3{print $1"\t"$2}!$3{print $0}' drwho.txt

解释：awk一行一行地工作，F参数定义了字段分隔符。

$3 {     # if field3 exists 
    print $1"\t"$2  # display field1, a tab, field2 
} 
!$3 {     # if field3 doesn't exist 
    print $0   # display the whole record (the line) 
}

来源

2014-11-02 21:08:30

@Casimir_et_Hippolyte：我看到ü知道谁博士:) ...但提供的第一个代码U不适用于我...既不在gedit中也不在使用'sed'。但我在[regex101.com]（http://regex101.com/r/vG0aK9/1）尝试过，但我留下了一些额外的字符... – tukusejssirs 2014-11-02 21:58:58

@tukusejssirs：我写了一个sed版本。 – 2014-11-02 22:17:50

@Casimir_et_Hippolyte：我不知道为什么，但这也行不通。仅供参考，我使用sed（GNU sed）4.2.2。 – tukusejssirs 2014-11-02 23:49:59

合并2正则表达式模式

回答

相关问题