我有数据框。我需要根据每个Id的updateTableTimestamp表中最新的记录。 df.show()
如何使用上次时间戳从数据框中选择不同的记录
+--------------------+-----+-----+--------------------+
| Description| Name| id |updateTableTimestamp|
+--------------------+-----+-----+--------------------+
| | 042F|64185| 1507306990753|
| | 042F|64185| 1507306990759|
|Testing |042MF| 941| 1507306990753|
| | 058F| 8770| 1507306990753|
|Testing 3 |083MF|31663| 1507306990759|
|Testing 2 |083MF|31663| 1507306990753|
+--------------------+-----+-----+--------------------+
需要输出
+--------------------+-----+-----+--------------------+
| Description| Name| id |updateTableTimestamp|
+--------------------+-----+-----+--------------------+
| | 042F|64185| 1507306990759|
|Testing |042MF| 941| 1507306990753|
| | 058F| 8770| 1507306990753|
|Testing 3 |083MF|31663| 1507306990759|
+--------------------+-----+-----+--------------------+
我已经试过
sqlContext.sql("SELECT * FROM (SELECT *, row_number() OVER (PARTITION BY Id ORDER BY updateTableTimestamp DESC) rank from temptable) tmp where rank = 1")
它给出了分区错误。在线程异常 “主” java.lang.RuntimeException: [1.29] failure: ``union'' expected but
(” found`I现在用火花1.6.2
“它给错误” - 错误是什么? – FuzzyTree
尝试'where tmp.rank = 1'或尝试使用与'rank'不同的别名,因为它是保留字。 – Simon
不支持PARTITION – lucy