2016-11-24 82 views
1

拆分单一的txt文件分成多个TXT文件,我有一个单一的txt文件,我想根据* TEXT ID如何通过Python的

例如将其分割成许多文件:一个txt文件看起来像这

*TEXT 017 01/04/63 PAGE 020 
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST 
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE..... 
*TEXT 018 01/04/63 PAGE 021 
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA 
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE..... 
*TEXT 019 01/04/63 PAGE 021 
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO 
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE 
AGAINST HIM, FOR WEIDNER, 40, WAS A.... 

如何拆分成多个txt文件?

filename: 
TEXT017.txt 

filename: 
TEXT018.txt 

filename: 
TEXT019.txt 
+0

看看're.split()'方法 – n1c9

+0

你试过了什么?你在哪一点遇到麻烦?分割文本?写文件?读文件? –

+0

@SonofaBeach我不知道如何将txt保存到多个txt文件相应地 – dd90p

回答

2

通过@ n1c9的启发,我修改和添加的东西,使之完成。

import re 

raw_string = """*TEXT 017 01/04/63 PAGE 020 
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST 
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE..... 
*TEXT 018 01/04/63 PAGE 021 
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA 
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE..... 
*TEXT 019 01/04/63 PAGE 021 
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO 
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE 
AGAINST HIM, FOR WEIDNER, 40, WAS A....""" 

split_strings = re.split('\n?(\*TEXT .*)\n', raw_string) 
blocks = [s for s in split_strings if s] # filter some blank strings 

for i in range(0, len(blocks), 2): 
    # extract `019` from `*TEXT 019 01/04/63 PAGE 021` 
    num = re.search('TEXT (\d+)', blocks[i]).group(1) 

    # save content to `TEXT019.txt` 
    filename = 'TEXT%s.txt' % num 
    content = blocks[i+1] 
    with open(filename, 'w+') as fp: 
     fp.write(content) 
+0

非常感谢..我接受你的 – dd90p

2

斯普利特文本文件导入线由什么划定一个新的文本ID的开头:

import re 

raw_string = """*TEXT 017 01/04/63 PAGE 020 
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST 
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE..... 
*TEXT 018 01/04/63 PAGE 021 
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA 
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE..... 
*TEXT 019 01/04/63 PAGE 021 
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO 
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE 
AGAINST HIM, FOR WEIDNER, 40, WAS A....""" 

split_string = re.split('(.*TEXT .*PAGE \d+)', raw_string) 
for item in split_stuff: 
    print('------') 
    print(item) 

------ 
*TEXT 017 01/04/63 PAGE 020 
------ 

THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST 
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE..... 

------ 
*TEXT 018 01/04/63 PAGE 021 
------ 

RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA 
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE..... 

------ 
*TEXT 019 01/04/63 PAGE 021 
------ 

BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO 
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE 
AGAINST HIM, FOR WEIDNER, 40, WAS A.... 
+0

我的意思是保存“1960年12月美国之后的所有盟友,美国首先提议帮助北约发展自己的核打击力量,但欧洲.....”成文件名称为“TEXT017.txt”。 – dd90p