0

我想在(N, 3) numpy array进行直方图,其三个维度表示经度,纬度和时间标记相应的,就像这样:如何使用histogramdd对dtype为对象的numpy数组执行直方图?

array([[116.45565032958984, 39.889976501464844, 
     datetime.datetime(2012, 10, 1, 6, 32, 39)], 
     [116.45565032958984, 39.889984130859375, 
     datetime.datetime(2012, 10, 1, 6, 33, 31)], 
     [116.45565032958984, 39.889984130859375, 
     datetime.datetime(2012, 10, 1, 6, 33, 33)], 
     [116.45565032958984, 39.889984130859375, 
     datetime.datetime(2012, 10, 1, 6, 33, 37)], 
     [116.45561981201172, 39.89040756225586, 
     datetime.datetime(2012, 10, 1, 6, 34, 42)], 
     [116.45561981201172, 39.890411376953125, 
     datetime.datetime(2012, 10, 1, 6, 36, 40)], 
     [116.45549774169922, 39.8941650390625, 
     datetime.datetime(2012, 10, 1, 6, 37, 54)], 
     [116.45556640625, 39.92431640625, 
     datetime.datetime(2012, 10, 1, 6, 38, 57)], 
     [116.45578002929688, 39.93780517578125, 
     datetime.datetime(2012, 10, 1, 6, 42, 10)], 
     [116.44468688964844, 39.93989944458008, 
     datetime.datetime(2012, 10, 1, 6, 43, 21)]], dtype=object) 

我试图用np.histogramdd这样的:

import numpy as np 
np.histogramdd(my_data, bins = (lon_bin_num, lat_bin_num, time_bin_num), 
       range = [[lon_min, lon_max], [lat_min, lat_max], 
       [start_datetime, end_datetime]]) 

而且有TypeError

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-271-58c94eecf21d> in <module>() 
     1 np.histogramdd(tmp2, bins = (lon_bin_num, lat_bin_num, time_bin_num), 
----> 2    range = [[lon_min, lon_max], [lat_min, lat_max], [start_datetime, end_datetime]]) 

/*/*/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights) 
    318   smax = zeros(D) 
    319   for i in arange(D): 
--> 320    smin[i], smax[i] = range[i] 
    321 
    322  # Make sure the bins have a finite width. 

TypeError: float() argument must be a string or a number 

我知道这是导致错误的日期时间对象,但我想知道如何纠正这个错误,或者如何在numpy ndarray上执行直方图,其dtype = object

回答

1

许多NumPy函数不能与dtype object的数组一起使用。要使用np.histogramdd,您需要一个形状为(N, D)的数组,因此结构化数组在这里也不会有帮助(因为结构化数组将移除D维)。您需要一个同质非对象dtype数组。由于前两列彩车,让我们试着表示第三列彩车太:

您可以将日期转换为NumPy的原生datetime64[s] D型:

In [102]: dates = np.array(my_data[:, 2],dtype='<M8[s]') 

In [103]: dates 
Out[103]: 
array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400', 
     '2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400', 
     '2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400', 
     '2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400', 
     '2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]') 

,然后用astype转换那些datetime64[s]小号成float S:

In [104]: float_dates = dates.astype('float') 

In [105]: float_dates 
Out[105]: 
array([ 1.34907316e+09, 1.34907321e+09, 1.34907321e+09, 
     1.34907322e+09, 1.34907328e+09, 1.34907340e+09, 
     1.34907347e+09, 1.34907354e+09, 1.34907373e+09, 
     1.34907380e+09]) 

立即形成具有D型细胞float一个新的数组:

arr = np.empty_like(my_data, dtype='float') 
arr[:, 0:2] = my_data[:, 0:2] 
arr[:, 2] = float_dates 

hist, edges = np.histogramdd(arr, bins=(xedges, yedges, zedges)) 

虽然这会给你一个直方图,但你可能还需要将浮点数重新解释为日期。你可以用astype来做到这一点。为了获得datetime64[s]

In [99]: float_dates.astype('<M8[s]') 
Out[99]: 
array(['2012-10-01T02:32:39-0400', '2012-10-01T02:33:31-0400', 
     '2012-10-01T02:33:33-0400', '2012-10-01T02:33:37-0400', 
     '2012-10-01T02:34:42-0400', '2012-10-01T02:36:40-0400', 
     '2012-10-01T02:37:54-0400', '2012-10-01T02:38:57-0400', 
     '2012-10-01T02:42:10-0400', '2012-10-01T02:43:21-0400'], dtype='datetime64[s]') 

要获得Python的datetime.datetime对象:

In [116]: float_dates.astype('<M8[s]').tolist() 
Out[116]: 
[datetime.datetime(2012, 10, 1, 6, 32, 39), 
datetime.datetime(2012, 10, 1, 6, 33, 31), 
datetime.datetime(2012, 10, 1, 6, 33, 33), 
datetime.datetime(2012, 10, 1, 6, 33, 37), 
datetime.datetime(2012, 10, 1, 6, 34, 42), 
datetime.datetime(2012, 10, 1, 6, 36, 40), 
datetime.datetime(2012, 10, 1, 6, 37, 54), 
datetime.datetime(2012, 10, 1, 6, 38, 57), 
datetime.datetime(2012, 10, 1, 6, 42, 10), 
datetime.datetime(2012, 10, 1, 6, 43, 21)] 
+0

它确实有帮助,谢谢:) – AnnabellChan 2014-12-09 14:35:38