问题导入数据集（txt文件）与Python使用numpy的库genfromtxt功能

我努力学习Python，但我试图导入一个数据集，并不能得到它正常工作......问题导入数据集（txt文件）与Python使用numpy的库genfromtxt功能

此数据集包含16列另有16 320行保存为txt文件。我用了genfromtxt功能如下：

import numpy as np 
dt=np.dtype([('name', np.str_, 16),('platform', np.str_, 16),('year', np.float_, (2,)),('genre', np.str_, 16),('publisher', np.str_, 16),('na_sales', np.float_, (2,)), ('eu_sales', np.float64, (2,)), ('jp_sales', np.float64, (2,)), ('other_sales', np.float64, (2,)), ('global_sales', np.float64, (2,)), ('critic_scores', np.float64, (2,)),('critic_count', np.float64, (2,)),('user_scores', np.float64, (2,)),('user_count', np.float64, (2,)),('developer', np.str_, 16),('rating', np.str_, 16)]) 
data=np.genfromtxt('D:\\data3.txt',delimiter=',',names=True,dtype=dt)

我得到这个错误：

ValueError: size of tuple must match number of fields.

但我dt变量，包含16种每列。我指定数据类型，因为否则这些字符串会被nan替换。

任何帮助，将不胜感激。

来源

2017-03-04 Ben_its

建议：从您的data3.txt文件中发布一些第一行。你确定它有16列吗？ – payne

为什么所有的'（2，）'在dtype中？你定义了16个字段，但所有的浮点数都加倍了。你有没有试过'dtype = None'加载？这让它推断出最好的dtype。 – hpaulj

看看你dt由数组：

In [78]: np.ones((1,),dt) 
Out[78]: 
array([ ('1', '1', [ 1., 1.], '1', '1', [ 1., 1.], [ 1., 1.], [ 1., 1.], 
     [ 1., 1.], [ 1., 1.], [ 1., 1.], [ 1., 1.], [ 1., 1.], 
     [ 1., 1.], '1', '1')], 
     dtype=[('name', '<U16'), ('platform', '<U16'), ('year', '<f8', (2,)), ('genre', '<U16'), ('publisher', '<U16'), ('na_sales', '<f8', (2,)), ('eu_sales', '<f8', (2,)), ('jp_sales', '<f8', (2,)), ('other_sales', '<f8', (2,)), ('global_sales', '<f8', (2,)), ('critic_scores', '<f8', (2,)), ('critic_count', '<f8', (2,)), ('user_scores', '<f8', (2,)), ('user_count', '<f8', (2,)), ('developer', '<U16'), ('rating', '<U16')])

我算26个1 S（字符串和浮点数），而不是16，你所需要的。你是否认为（2，）表示双重？它表示一个2元素的子字段。

取出所有那些（2）

In [80]: np.ones((1,),dt) 
Out[80]: 
array([ ('1', '1', 1., '1', '1', 1., 1., 1., 1., 1., 1., 1., 1., 1., '1', '1')], 
     dtype=[('name', '<U16'), ('platform', '<U16'), ('year', '<f8'), ('genre', '<U16'), ('publisher', '<U16'), ('na_sales', '<f8'), ('eu_sales', '<f8'), ('jp_sales', '<f8'), ('other_sales', '<f8'), ('global_sales', '<f8'), ('critic_scores', '<f8'), ('critic_count', '<f8'), ('user_scores', '<f8'), ('user_count', '<f8'), ('developer', '<U16'), ('rating', '<U16')])

现在我有16场应该分析你的16列恰到好处。

但是dtype=None通常也适用。它让genfromtxt推导出每个领域的最佳dtype。在这种情况下，它会从列标题行（您的names=True参数）中获取字段名称。

在将代码投入更大的脚本之前，测试复杂的代码行是个好主意。特别是如果你在学习的过程中。

来源

2017-03-04 17:18:09 hpaulj

问题导入数据集（txt文件）与Python使用numpy的库genfromtxt功能

回答

相关问题