-1
我正在努力获得2数据帧的CROSS JOIN。我正在使用spark 2.0。如何使用2个数据框来实现CROSSS JOIN?如何交叉连接2数据帧?
编辑:
val df=df.join(df_t1, df("Col1")===df_t1("col")).join(df2,joinType=="cross join").where(df("col2")===df2("col2"))
我正在努力获得2数据帧的CROSS JOIN。我正在使用spark 2.0。如何使用2个数据框来实现CROSSS JOIN?如何交叉连接2数据帧?
编辑:
val df=df.join(df_t1, df("Col1")===df_t1("col")).join(df2,joinType=="cross join").where(df("col2")===df2("col2"))
呼叫加入与其他数据帧,而无需使用连接条件。
看看下面的示例。 鉴于以人为本数据框:区域
+---+------+-------+------+
| id| name| mail|idArea|
+---+------+-------+------+
| 1| Jack|[email protected]| 1|
| 2|Valery|[email protected]| 1|
| 3| Karl|[email protected]| 2|
| 4| Nick|[email protected]| 2|
| 5| Luke|[email protected]| 3|
| 6| Marek|[email protected]| 3|
+---+------+-------+------+
和第二数据帧:
+------+--------------+
|idArea| areaName|
+------+--------------+
| 1|Amministration|
| 2| Public|
| 3| Store|
+------+--------------+
的CROSS JOIN是简单地由下式给出:
val cross = people.join(area)
+---+------+-------+------+------+--------------+
| id| name| mail|idArea|idArea| areaName|
+---+------+-------+------+------+--------------+
| 1| Jack|[email protected]| 1| 1|Amministration|
| 1| Jack|[email protected]| 1| 3| Store|
| 1| Jack|[email protected]| 1| 2| Public|
| 2|Valery|[email protected]| 1| 1|Amministration|
| 2|Valery|[email protected]| 1| 3| Store|
| 2|Valery|[email protected]| 1| 2| Public|
| 3| Karl|[email protected]| 2| 1|Amministration|
| 3| Karl|[email protected]| 2| 2| Public|
| 3| Karl|[email protected]| 2| 3| Store|
| 4| Nick|[email protected]| 2| 3| Store|
| 4| Nick|[email protected]| 2| 2| Public|
| 4| Nick|[email protected]| 2| 1|Amministration|
| 5| Luke|[email protected]| 3| 2| Public|
| 5| Luke|[email protected]| 3| 3| Store|
| 5| Luke|[email protected]| 3| 1|Amministration|
| 6| Marek|[email protected]| 3| 1|Amministration|
| 6| Marek|[email protected]| 3| 2| Public|
| 6| Marek|[email protected]| 3| 3| Store|
+---+------+-------+------+------+--------------+
升级到最新的火花sql_2的版本.11版本2.1.0并使用函数.crossJoin数据集
向我们展示您尝试过的。 ... –
val df = df.join(df_t1,df(“Col1”)=== df_t1(“col”))。join(df2,joinType ==“cross join”)其中(df(“col2”)) === DF2( “COL2”)) – Miruthan