2
简短的问题:
我试图在两种不同的方式编组的多索引熊猫数据帧后得到一个平均列(数据系列)。区别仅在于DataFrame的构造。一个给我所期望的结果,另外提供了一个错误DataError: No numeric types to aggregate
同样多指标数据框(平均)
描述:用于施工
import pandas as pd
import numpy as np
indexTuples = [('a', 1), ('b', 3), ('a', 2), ('c', 2), ('c', 3), ('b', 8)]
multiIndex = pd.MultiIndex.from_tuples(indexTuples, names = ['x', 'y'])
通过方法1
columns = ['alpha', 'beta', 'gamma']
df = pd.DataFrame(index=multiIndex, columns=columns)
alpha = pd.Series(index=multiIndex)
beta = pd.Series(index=multiIndex)
gamma = pd.Series(index=multiIndex)
for tup in indexTuples:
alpha[tup[0], tup[1]] = np.random.randint(400)
beta[tup[0], tup[1]] = np.random.randint(400)
gamma[tup[0], tup[1]] = np.random.randint(400)
df.alpha = alpha
df.beta = beta
df.gamma = gamma
df.alpha['a'] = np.nan
df
构建数据帧
公共数据给出的数据帧看起来像下面那样
alpha beta gamma
x y
a 1 NaN 136.0 224.0
b 3 375.0 227.0 191.0
a 2 NaN 367.0 195.0
c 2 247.0 61.0 78.0
3 238.0 187.0 366.0
b 8 302.0 14.0 272.0
,如果我做了以下操作,我得到预期的结果
df.groupby(level='x').alpha.mean()
结果
x
a NaN
b 148.0
c 244.5
Name: alpha, dtype: float64
的方法构建数据框2
columns = ['alpha', 'beta', 'gamma']
_df = pd.DataFrame(index=multiIndex, columns=columns)
for tup in indexTuples:
_df.alpha[tup[0], tup[1]] = np.random.randint(400)
_df.beta[tup[0], tup[1]] = np.random.randint(400)
_df.gamma[tup[0], tup[1]] = np.random.randint(400)
_df.alpha['a'] = np.nan
给出了类似的使用NaN
的值查看DataFrame,如p。中所示revious方法
但现在当我尝试通过水平分组后,发现平均
_df.groupby(level='x').alpha.mean()
我收到以下错误
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-192-ad2de6450fab> in <module>()
----> 1 _df.groupby(level='x').alpha.mean()
/film/tools/packages/pandas/0.18.0/CentOS-6.2_thru_7/python-2.7/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in mean(self)
933 """
934 try:
--> 935 return self._cython_agg_general('mean')
936 except GroupByError:
937 raise
/film/tools/packages/pandas/0.18.0/CentOS-6.2_thru_7/python-2.7/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
750
751 if len(output) == 0:
--> 752 raise DataError('No numeric types to aggregate')
753
754 return self._wrap_aggregated_output(output, names)
DataError: No numeric types to aggregate
为什么在第一种情况下工作,而不是在第二种情况?
不知何故** ** D型不能在我的数据框中工作,但您解决方案的工作!正确指出为dtype问题 '_df.dtype AttributeError:'DataFrame'对象没有属性'dtype'' – narenandu
这是我的错字。它应该是dtypes(复数为dataframe) – piRSquared
谢谢...工作 – narenandu