这可以让你的所有结果在一个新的数据帧:
val df1 = Seq(
"02-01-2015",
"02-02-2015",
"02-03-2015"
).toDF("date")
.withColumn("date", from_unixtime(unix_timestamp($"date", "dd-MM-yyyy")))
val df2 = Seq(
(1, "balance", 100, "01-01-2015"),
(1, "balance", 100, "05-01-2015"),
(1, "balance", 100, "30-01-2015"),
(1, "balance", 100, "01-02-2015"),
(1, "balance", 100, "01-03-2015")
).toDF("ID", "feature", "value", "date")
.withColumn("date", from_unixtime(unix_timestamp($"date", "dd-MM-yyyy")))
df1.join(
df2, df2("date") < df1("date"), "left"
).show()
+-------------------+---+-------+-----+-------------------+
| date| ID|feature|value| date|
+-------------------+---+-------+-----+-------------------+
|2015-01-02 00:00:00| 1|balance| 100|2015-01-01 00:00:00|
|2015-02-02 00:00:00| 1|balance| 100|2015-01-01 00:00:00|
|2015-02-02 00:00:00| 1|balance| 100|2015-01-05 00:00:00|
|2015-02-02 00:00:00| 1|balance| 100|2015-01-30 00:00:00|
|2015-02-02 00:00:00| 1|balance| 100|2015-02-01 00:00:00|
|2015-03-02 00:00:00| 1|balance| 100|2015-01-01 00:00:00|
|2015-03-02 00:00:00| 1|balance| 100|2015-01-05 00:00:00|
|2015-03-02 00:00:00| 1|balance| 100|2015-01-30 00:00:00|
|2015-03-02 00:00:00| 1|balance| 100|2015-02-01 00:00:00|
|2015-03-02 00:00:00| 1|balance| 100|2015-03-01 00:00:00|
+-------------------+---+-------+-----+-------------------+
编辑: 到从df2获得匹配记录的数量,请执行以下操作:
df1.join(
df2, df2("date") < df1("date"), "left"
)
.groupBy(df1("date"))
.count
.orderBy(df1("date"))
.show
+-------------------+-----+
| date|count|
+-------------------+-----+
|2015-01-02 00:00:00| 1|
|2015-02-02 00:00:00| 4|
|2015-03-02 00:00:00| 5|
+-------------------+-----+
您想一次只从df1中取一行或全部取一行? –
嗨Ramesh。是一次从df1中取出一行,并比较df2中的'date',并从df2中获得df1中所有小于date的所有行。 – Kirupa