你可以做到这一点,而无需使用正则表达式。分割字符串在管字符,用生成器表达式和inbuild string.isalpha()
函数滤除那些仅是字母字符的单词,并一同加入,以形成最终输出:
old_fruits = 'apple|0.00|kiwi|0.00|0.5369|-0.2437|banana|0.00|pear'
words = (word for word in old_fruits.split('|') if word.isalpha())
new_fruits = '\n'.join(words)
print(new_fruits)
输出是
apple
kiwi
banana
pear
根据需要(不写入文件,但我认为你能够应付这种情况)。
编辑:敲了一个快速的脚本来提供正则表达式的与非正则表达式的一些时间比较:
import timeit
# Setup - not counted in the timing so it doesn't matter we include regex for both tests
setup = r"""old_fruits = 'apple|0.00|kiwi|0.00|0.5369|-0.2437|banana|0.00|pear'
import re
fruit_re=re.compile(r'[^\W\d]+')
"""
no_re = r"""words = (word for word in old_fruits.split('|') if word.isalpha())
new_fruits = '\n'.join(words)"""
with_re = r"""new_fruits = '\n'.join(fruit_re.findall(old_fruits))"""
num = 10000
print("Short input")
t = timeit.timeit(no_re, setup, number=num)
print("No regex: {0:.2f} microseconds to run".format((t*1e6)/num))
t = timeit.timeit(with_re, setup, number=num)
print("With regex: {0:.2f} microseconds to run".format((t*1e6)/num))
print("")
print("100 times longer input")
setup = r"""old_fruits = 'apple|0.00|kiwi|0.00|0.5369|-0.2437|banana|0.00|pear'*100
import re
fruit_re=re.compile(r'[^\W\d]+')"""
t = timeit.timeit(no_re, setup, number=num)
print("No regex: {0:.2f} microseconds to run".format((t*1e6)/num))
t = timeit.timeit(with_re, setup, number=num)
print("With regex: {0:.2f} microseconds to run".format((t*1e6)/num))
我的计算机上的结果:
Short input
No regex: 18.31 microseconds to run
With regex: 15.37 microseconds to run
100 times longer input
No regex: 793.79 microseconds to run
With regex: 999.08 microseconds to run
所以预编译对于较短的输入字符串,正则表达式更快,对于较长的输入字符串,生成器表达式更快(至少在我的计算机上 - Ubuntu Linux,Python 2.7 - 结果可能因您而异)。
谢谢!我试过了,它像我希望的那样工作。 – Levar 2012-08-13 03:49:29