根据您添加的代码示例,您尝试回答的问题是如何替换' '
和', '
,以获取pandas dataframe
中的每一行。
这里有一个办法做到这一点:
import pandas as pd
sampletxt = pd.read_csv('teste.csv' , header = None)
output = sampletxt.replace('\s+', ', ', regex=True)
print(output)
例子:
In [24]: l
Out[24]:
['input phrase of the file to exemplify',
'input phrase of the file to exemplify 2',
'input phrase of the file to exemplify 4']
In [25]: sampletxt = pd.DataFrame(l)
In [26]: sampletxt
Out[26]:
0
0 input phrase of the file to exemplify
1 input phrase of the file to exemplify 2
2 input phrase of the file to exemplify 4
In [27]: output = sampletxt.replace('\s+', ', ', regex=True)
In [28]: output
Out[28]:
0
0 input, phrase, of, the, file, to, exemplify
1 input, phrase, of, the, file, to, exemplify, 2
2 input, phrase, of, the, file, to, exemplify, 4
OLD答案
您还可以使用re.sub(..)
,如下所示:
In [3]: import re
In [4]: st = "input phrase of the file to exemplify"
In [5]: re.sub(' ',', ', st)
Out[5]: 'input, phrase, of, the, file, to, exemplify'
re.sub(...)
快于str.replace(..)
In [6]: timeit re.sub(' ',', ', st)
100000 loops, best of 3: 1.74 µs per loop
In [7]: timeit st.replace(' ',', ')
1000000 loops, best of 3: 257 ns per loop
如果你有多个空格分隔两个单词的基础上,str.replace(' ',',')
将是错误的输出所有的答案。例如
In [15]: st
Out[15]: 'input phrase of the file to exemplify'
In [16]: re.sub(' ',', ', st)
Out[16]: 'input, phrase, of, the, file, to, , exemplify'
In [17]: st.replace(' ',', ')
Out[17]: 'input, phrase, of, the, file, to, , exemplify'
为了解决这个问题,你需要使用符合一个或多个空格如下正则表达式(正则表达式):
In [22]: st
Out[22]: 'input phrase of the file to exemplify'
In [23]: re.sub('\s+', ', ', st)
Out[23]: 'input, phrase, of, the, file, to, exemplify'
的'替换()'功能,这表现在所有的答案,是你在找什么。但是,请注意,如果单词之间有多个空格,则可能会收到不良结果。例如,'a b c'.replace('',',')'返回a,b,c''。如果这对你来说不是问题,那么你很好。 – Reti43