合并(内部连接)这两个dataframes应该工作:
In [96]: df1['date'] = pd.DatetimeIndex (df1.creationdate).date
In [97]: df2['date'] = pd.DatetimeIndex (df2.date).date
In [98]: df=df1.merge(df2, on='date', how='inner')
In [99]: df
Out[99]:
creationdate date sitename
0 2012-05-01 18:20:27.167000 2012-05-01 Google
1 2012-05-01 19:16:08.070000 2012-05-01 Google
2 2012-05-01 19:20:07.880000 2012-05-01 Google
3 2012-05-01 19:33:02.200000 2012-05-01 Google
4 2012-05-01 19:35:09.173000 2012-05-01 Google
5 2012-05-01 20:18:55.610000 2012-05-01 Google
6 2012-05-01 20:26:27.577000 2012-05-01 Google
7 2012-05-01 20:32:34.343000 2012-05-01 Google
8 2012-05-01 20:39:31.257000 2012-05-01 Google
9 2012-05-01 21:04:50.357000 2012-05-01 Google
10 2012-05-01 21:54:18.983000 2012-05-01 Google
11 2012-05-02 02:23:53.250000 2012-05-02 Google
12 2012-05-02 02:40:27.643000 2012-05-02 Google
13 2012-05-02 08:44:28.260000 2012-05-02 Google
然后你就可以在df
像
In [100]: df['time_diff'] = df.creationdate.diff()
In [101]: df.time_diff
Out[101]:
0 NaT
1 00:55:40.903000
2 00:03:59.810000
3 00:12:54.320000
4 00:02:06.973000
5 00:43:46.437000
6 00:07:31.967000
7 00:06:06.766000
8 00:06:56.914000
9 00:25:19.100000
10 00:49:28.626000
11 04:29:34.267000
12 00:16:34.393000
13 06:04:00.617000
Name: time_diff, dtype: timedelta64[ns]
当然,你的creationdate
需求做分析,以datetime64[ns]
不串。或者您需要转换pd.DatetimeIndex (df.creationdate)
您尝试过什么吗?这对于[日期时间](https://docs.python.org/2/library/datetime.html)来说看起来非常简单。 – CoryKramer
@Cyber:我将第二个df的日期列设置为索引,并在检查索引是否等于从第一个数据帧的每个元素提取的日期时尝试循环。但是这会每次检查第一个数据帧的所有元素。这是我要求的一种有效的方式 – user3527975
@Cyber:你能告诉你简单的方法吗?我是数据框的新手。 – user3527975