2017-02-10 193 views
-1

我正在努力获得2数据帧的CROSS JOIN。我正在使用spark 2.0。如何使用2个数据框来实现CROSSS JOIN?如何交叉连接2数据帧?

编辑:

val df=df.join(df_t1, df("Col1")===df_t1("col")).join(df2,joinType=="cross join").where(df("col2")===df2("col2")) 
+0

向我们展示您尝试过的。 ... –

+0

val df = df.join(df_t1,df(“Col1”)=== df_t1(“col”))。join(df2,joinType ==“cross join”)其中(df(“col2”)) === DF2( “COL2”)) – Miruthan

回答

0

呼叫加入与其他数据帧,而无需使用连接条件。

看看下面的示例。 鉴于以人为本数据框:区域

+---+------+-------+------+ 
| id| name| mail|idArea| 
+---+------+-------+------+ 
| 1| Jack|[email protected]|  1| 
| 2|Valery|[email protected]|  1| 
| 3| Karl|[email protected]|  2| 
| 4| Nick|[email protected]|  2| 
| 5| Luke|[email protected]|  3| 
| 6| Marek|[email protected]|  3| 
+---+------+-------+------+ 

和第二数据帧:

+------+--------------+ 
|idArea|  areaName| 
+------+--------------+ 
|  1|Amministration| 
|  2|  Public| 
|  3|   Store| 
+------+--------------+ 

的CROSS JOIN是简单地由下式给出:

val cross = people.join(area) 
+---+------+-------+------+------+--------------+ 
| id| name| mail|idArea|idArea|  areaName| 
+---+------+-------+------+------+--------------+ 
| 1| Jack|[email protected]|  1|  1|Amministration| 
| 1| Jack|[email protected]|  1|  3|   Store| 
| 1| Jack|[email protected]|  1|  2|  Public| 
| 2|Valery|[email protected]|  1|  1|Amministration| 
| 2|Valery|[email protected]|  1|  3|   Store| 
| 2|Valery|[email protected]|  1|  2|  Public| 
| 3| Karl|[email protected]|  2|  1|Amministration| 
| 3| Karl|[email protected]|  2|  2|  Public| 
| 3| Karl|[email protected]|  2|  3|   Store| 
| 4| Nick|[email protected]|  2|  3|   Store| 
| 4| Nick|[email protected]|  2|  2|  Public| 
| 4| Nick|[email protected]|  2|  1|Amministration| 
| 5| Luke|[email protected]|  3|  2|  Public| 
| 5| Luke|[email protected]|  3|  3|   Store| 
| 5| Luke|[email protected]|  3|  1|Amministration| 
| 6| Marek|[email protected]|  3|  1|Amministration| 
| 6| Marek|[email protected]|  3|  2|  Public| 
| 6| Marek|[email protected]|  3|  3|   Store| 
+---+------+-------+------+------+--------------+ 
2

升级到最新的火花sql_2的版本.11版本2.1.0并使用函数.crossJoin数据集