阅读CSV与大熊猫有这种数据集

我有一些麻烦来读取这样的数据集：阅读CSV与大熊猫有这种数据集

# title 
# description 
# link (could be not still active) 
# id 
# date 
# source (nyt|us|reuters) 
# category

例如：

court agrees to expedite n.f.l.'s appeal\n 
the decision means a ruling could be made nearly two months before the regular season begins, time for the sides to work out a deal without delaying the 
season.\n 
http://feeds1.nytimes.com/~r/nyt/rss/sports/~3/nbjo7ygxwpc/04nfl.html\n 
0\n 
04 May 2011 07:39:03\n 
nyt\n 
sport\n

我想：

columns = ['title', 'description', 'link', 'id', 'date', 'source', 'category'] 
df = pd.read_csv('news', delimiter = "\n", names = columns,error_bad_lines=False)

但它将所有信息放入列标题中。

有人知道一种方法来解决这个问题吗？

谢谢！

来源

2017-06-16 Nico2rdj

不能使用\n为CSV分隔符，你可以做的是设置等于列名的索引，然后调换，即

df = pd.read_csv('news', index=columns).transpose()

来源

2017-06-16 01:05:34 maxymoo

这里有几点需要注意：

1）长度超过1个字符的任何分隔符由Pandas解释为正则表达式。 2）由于'c'引擎不支持正则表达式，我已经明确地将引擎定义为'python'来避免警告。

3）我不得不添加一个虚拟列，因为在文件末尾有一个'\ n'，后来我用drop删除了该列。

因此，这些行将有望得到你想要的结果。

columns = ['title', 'description', 'link', 'id', 'date', 'source', 'category','dummy'] 
df = pd.read_csv('news', names=columns, delimiter="\\\\n", engine='python').drop('dummy',axis=1) 
df

我希望这有助于:)

来源

2017-06-16 02:52:53

阅读CSV与大熊猫有这种数据集

回答

相关问题