过滤器numpy元组数组

Scikit-learn库有一个数据聚类的好例子 - stock market structure。它在美国股票内运作良好。但是当增加其他市场的代码时，numpy的错误表明阵列应该具有相同的大小 - 例如，德国的股票有不同的交易日历。过滤器numpy元组数组

好，后引号下载我想补充制剂共享日期：

quotes = [quotes_historical_yahoo_ochl(symbol, d1, d2, asobject=True) 
      for symbol in symbols] 


def intersect(list_1, list_2): 
    return list(set(list_1) & set(list_2)) 

dates_all = quotes[0].date 
for q in quotes: 
    dates_symbol = q.date 
    dates_all = intersect(dates_all, dates_symbol)

然后我卡与过滤元组的numpy的阵列。下面是一些尝试：

# for index, q in enumerate(quotes): 
#  filtered = [i for i in q if i.date in dates_all] 

#  quotes[index] = np.rec.array(filtered, dtype=q.dtype) 
#  quotes[index] = np.asanyarray(filtered, dtype=q.dtype) 
# 
#  quotes[index] = np.where(a.date in dates_all for a in q) 
# 
#  quotes[index] = np.where(q[0].date in dates_all)

如何将过滤器应用于numpy的阵列或如何真正转换的记录（过滤后）列表回numpy的recarray？

报价[0] .dtype：

'(numpy.record, [('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), ('d', '<f8'), ('open', '<f8'), ('close', '<f8'), ('high', '<f8'), ('low', '<f8'), ('volume', '<f8'), ('aclose', '<f8')])'

报价[0] .shape：

<class 'tuple'>: (261,)

来源

2017-05-01 Maxim Korobov

By arra y的元组我怀疑你的意思是一个'结构化数组'（或'recarray'）。如果是这样，我们想知道阵列的“形状”和“dtype”。 – hpaulj

感谢您的留言。添加！ –

所以quotes是recarrays的列表，并在date_all你收集的所有值的交集在date字段中。

我可以重建一个这样的阵列：

In [286]: dt=np.dtype([('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 
    ...: 
    ...:), ('low', '<f8'), ('volume', '<f8'), ('aclose', '<f8')]) 
In [287]: 
In [287]: arr=np.ones((2,), dtype=dt) # 2 element structured array 
In [288]: arr 
Out[288]: 
array([(1, 1, 1, 1, 1., 1., 1., 1., 1., 1., 1.), 
     (1, 1, 1, 1, 1., 1., 1., 1., 1., 1., 1.)], 
     dtype=[('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), ... ('aclose', '<f8')]) 
In [289]: type(arr[0]) 
Out[289]: numpy.void

把它转换成一个recarray（我不”使用这些尽可能纯的结构化阵列）：

In [291]: np.rec.array(arr) 
Out[291]: 
rec.array([(1, 1, 1, 1, 1., 1., 1., 1., 1., 1., 1.), 
(1, 1, 1, 1, 1., 1., 1., 1., 1., 1., 1.)], 
      dtype=[('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), .... ('aclose', '<f8')])

的recarray显示器dtype略有不同：

In [292]: _.dtype 
Out[292]: dtype((numpy.record, [('date', 'O'), ('year', '<i2'), ('month', 'i1'), ....('aclose', '<f8')])) 
In [293]: __.date 
Out[293]: array([1, 1], dtype=object)

在任何情况下date字段是对象数组，可能是datetime？

q是这些阵列之一; i是一个元素，而i.date是日期字段。

[i for i in q if i.date in dates_all]

所以filtered是重新列阵元素的列表。 np.stack可以更好地将它们重组为一个数组（它也适用于recarray）。

np.stack([i for i in arr if i['date'] in alist])

或者你可以收集的匹配记录的索引和索引报价阵列

In [319]: [i for i,v in enumerate(arr) if v['date'] in alist] 
Out[319]: [0, 1] 
In [320]: arr[_]

或拔出日期字段第一：

In [321]: [i for i,v in enumerate(arr['date']) if v in alist] 
Out[321]: [0, 1]

in1d也可能工作搜索

In [322]: np.in1d(arr['date'],alist) 
Out[322]: array([ True, True], dtype=bool) 
In [323]: np.where(np.in1d(arr['date'],alist)) 
Out[323]: (array([0, 1], dtype=int32),)

来源

2017-05-02 06:29:08 hpaulj

感谢您提供详细的回复并吸引NumPy技巧：'我['date']'，'in1d'其他！ –

过滤器numpy元组数组

回答

相关问题