如何将CSV文件与熊猫结合起来（并添加识别列）

如何将多个CSV文件添加到一起并添加一列来指示每个文件的来源？如何将CSV文件与熊猫结合起来（并添加识别列）

到目前为止，我有：

import os 
import pandas as pd 
import glob 

os.chdir('C:\...') # path to folder where all CSVs are stored 
for f, i in zip(glob.glob('*.csv'), short_list): 
    df = pd.read_csv(f, header = None) 
    df.index = i * len(df) 
    dfs.append(df) 

all_data = pd.concat(dfs, ignore_index=True)

这一切运作良好，除了标识列。 i是我想列入all_data的A列中的strings的列表。每列的每一行都有一个字符串。相反，它会返回大量数字，并给出TypeError: Index(....) must be called witha collection of some kind。

预期输出：

str1 file1entry1 
str1 file1entry2 
str1 file1entry3 
str2 file2entry1 
str2 file2entry2 
str2 file2entry3

凡short_list = ['str1', 'str2', 'str3']，并file1entery1, file2entry2... etc来自CSV文件，我已经有了。

解决方案：我无法像解决方案所建议的那样将所有内容都解决出来，但它指出我的方向正确。

for f zip(glob.glob('*csv')): 
    df = pd.read_csv(f, header = None) 
    df = df.assign(id = os.path.basename(f)) # simpler than pulling from the array. Adds file name to each line. 
    dfs.append(df) 

all_data = pd.concat(dfs)

来源

2016-09-20 R.M.

无需使用'* LEN（DF）'。将标量分配给新列时，值将应用于每一行。 – Parfait

请注意，您实际上并不需要在这里使用熊猫。你可以简单地使用'csv'模块。 –

可以使用.assign(id=i)方法，这将id列添加到每个解析CSV，将与i值来填充它：

df = pd.concat([pd.read_csv(f, header = None).assign(id=i) 
       for f, i in zip(glob.glob('*.csv), short_list)], 
       ignore_index=True)

来源

2016-09-20 21:40:03 MaxU

想要回复您的评论。 'str1，str2，str3'存储在'short_list'中。错字。 –

如何将CSV文件与熊猫结合起来（并添加识别列）

回答

相关问题