2014-07-01 105 views
0

我有两个数据框:一个只有公司名称和日期。其他只有时间戳。像显示在下面在熊猫数据帧中遍历行

creationdate 
0 2012-05-01 18:20:27.167000 
1 2012-05-01 19:16:08.070000 
2 2012-05-01 19:20:07.880000 
3 2012-05-01 19:33:02.200000 
4 2012-05-01 19:35:09.173000 
5 2012-05-01 20:18:55.610000 
6 2012-05-01 20:26:27.577000 
7 2012-05-01 20:32:34.343000 
8 2012-05-01 20:39:31.257000 
9 2012-05-01 21:04:50.357000 
10 2012-05-01 21:54:18.983000 
11 2012-05-02 02:23:53.250000 
12 2012-05-02 02:40:27.643000 
13 2012-05-02 08:44:28.260000 

而且

我有效地通过第二数据帧环和如何可以提取对应于所述第二数据帧的每个日期的第一数据帧的时间戳。

+0

您尝试过什么吗?这对于[日期时间](https://docs.python.org/2/library/datetime.html)来说看起来非常简单。 – CoryKramer

+0

@Cyber​​:我将第二个df的日期列设置为索引,并在检查索引是否等于从第一个数据帧的每个元素提取的日期时尝试循环。但是这会每次检查第一个数据帧的所有元素。这是我要求的一种有效的方式 – user3527975

+0

@Cyber​​:你能告诉你简单的方法吗?我是数据框的新手。 – user3527975

回答

2

合并(内部连接)这两个dataframes应该工作:

In [96]: df1['date'] = pd.DatetimeIndex (df1.creationdate).date 

In [97]: df2['date'] = pd.DatetimeIndex (df2.date).date 

In [98]: df=df1.merge(df2, on='date', how='inner') 

In [99]: df 
Out[99]: 
       creationdate  date sitename 
0 2012-05-01 18:20:27.167000 2012-05-01 Google 
1 2012-05-01 19:16:08.070000 2012-05-01 Google 
2 2012-05-01 19:20:07.880000 2012-05-01 Google 
3 2012-05-01 19:33:02.200000 2012-05-01 Google 
4 2012-05-01 19:35:09.173000 2012-05-01 Google 
5 2012-05-01 20:18:55.610000 2012-05-01 Google 
6 2012-05-01 20:26:27.577000 2012-05-01 Google 
7 2012-05-01 20:32:34.343000 2012-05-01 Google 
8 2012-05-01 20:39:31.257000 2012-05-01 Google 
9 2012-05-01 21:04:50.357000 2012-05-01 Google 
10 2012-05-01 21:54:18.983000 2012-05-01 Google 
11 2012-05-02 02:23:53.250000 2012-05-02 Google 
12 2012-05-02 02:40:27.643000 2012-05-02 Google 
13 2012-05-02 08:44:28.260000 2012-05-02 Google 

然后你就可以在df

In [100]: df['time_diff'] = df.creationdate.diff() 

In [101]: df.time_diff 
Out[101]: 
0    NaT 
1 00:55:40.903000 
2 00:03:59.810000 
3 00:12:54.320000 
4 00:02:06.973000 
5 00:43:46.437000 
6 00:07:31.967000 
7 00:06:06.766000 
8 00:06:56.914000 
9 00:25:19.100000 
10 00:49:28.626000 
11 04:29:34.267000 
12 00:16:34.393000 
13 06:04:00.617000 
Name: time_diff, dtype: timedelta64[ns] 

当然,你的creationdate需求做分析,以datetime64[ns]不串。或者您需要转换pd.DatetimeIndex (df.creationdate)

+0

这可以按我的需要工作。谢谢 !! – user3527975