在每个单词前添加Virgula

我有一个超过一千行的文本文件，对于某个特定的过程，我需要用逗号分隔这些单词。我想帮助开发这一算法在python，因为我开始在语言在每个单词前添加Virgula

ENTRADA

input phrase of the file to exemplify

赛达

input, phrase, of, the, file, to, exemplify

我想是这样的：

import pandas as pd 

sampletxt = pd.read_csv('teste.csv' , header = None) 
output = sampletxt.replace(" ", ", ") 

print output

来源

2017-10-11 Rivaldo Hater

的'替换（）'功能，这表现在所有的答案，是你在找什么。但是，请注意，如果单词之间有多个空格，则可能会收到不良结果。例如，'a b c'.replace（''，'，'）'返回a，b，c''。如果这对你来说不是问题，那么你很好。 – Reti43

根据您添加的代码示例，您尝试回答的问题是如何替换' '和', '，以获取pandas dataframe中的每一行。

这里有一个办法做到这一点：

import pandas as pd 

sampletxt = pd.read_csv('teste.csv' , header = None) 
output = sampletxt.replace('\s+', ', ', regex=True) 
print(output)

例子：

In [24]: l 
Out[24]: 
['input phrase of the file to exemplify', 
'input phrase of the file to exemplify 2', 
'input phrase of the file to exemplify 4'] 

In [25]: sampletxt = pd.DataFrame(l) 

In [26]: sampletxt 
Out[26]: 
             0 
0 input phrase of the file to exemplify 
1 input phrase of the file to exemplify 2 
2 input phrase of the file to exemplify 4 

In [27]: output = sampletxt.replace('\s+', ', ', regex=True) 

In [28]: output 
Out[28]: 
               0 
0  input, phrase, of, the, file, to, exemplify 
1 input, phrase, of, the, file, to, exemplify, 2 
2 input, phrase, of, the, file, to, exemplify, 4

OLD答案

您还可以使用re.sub(..)，如下所示：

In [3]: import re 

In [4]: st = "input phrase of the file to exemplify" 

In [5]: re.sub(' ',', ', st) 
Out[5]: 'input, phrase, of, the, file, to, exemplify'

re.sub(...)快于str.replace(..)

In [6]: timeit re.sub(' ',', ', st) 
100000 loops, best of 3: 1.74 µs per loop 

In [7]: timeit st.replace(' ',', ') 
1000000 loops, best of 3: 257 ns per loop

如果你有多个空格分隔两个单词的基础上，str.replace(' ',',')将是错误的输出所有的答案。例如

In [15]: st 
Out[15]: 'input phrase of the file to exemplify' 

In [16]: re.sub(' ',', ', st) 
Out[16]: 'input, phrase, of, the, file, to, , exemplify' 

In [17]: st.replace(' ',', ') 
Out[17]: 'input, phrase, of, the, file, to, , exemplify'

为了解决这个问题，你需要使用符合一个或多个空格如下正则表达式（正则表达式）：

In [22]: st 
Out[22]: 'input phrase of the file to exemplify' 

In [23]: re.sub('\s+', ', ', st) 
Out[23]: 'input, phrase, of, the, file, to, exemplify'

来源

2017-10-11 21:16:33 MedAli

很好的解释，谢谢。 –

the_list = entrada.split(" ") # take input & make a list of all values, separated by " " 
saida = the_list.join(", ") # join all elements with ", "

来源

2017-10-11 21:08:33 Eqomatic

'split（）'默认在空格处分割。但是，split（）和split（''）'有区别，前者可能更可取。 – Reti43

随着几千行，我想它会有点慢分裂和加入每一行..：/ – peyo

我想适应在我的文本文件中使用。 –

你的线可能只是一个字符串，所以你可以使用：

line.replace(" ",", ")

来源

2017-10-11 21:09:26

复杂明智的，你应该直接用逗号替换空间，而不是多次穿越的短语。

the_list = entrada.replace(' ', ', ')

来源

2017-10-11 21:11:07

首先，您需要read your input on line at a time。然后你只需使用str.replace（）这样：

sampletxt = "input phrase of the file to exemplify" 
output = sampletxt.replace(" ", ", ")

大功告成。

来源

2017-10-11 21:12:22 peyo

在每个单词前添加Virgula

回答

相关问题