您可以尝试如下。
val df = Seq(("tx-1", "aaa"), ("tx-2", "bbb"), ("tx-1", "ccc"),("tx-4", "ccc")).toDF("Transaction_ID", "Product_ID")
df.show
+--------------+----------+
|Transaction_ID|Product_ID|
+--------------+----------+
| tx-1| aaa|
| tx-2| bbb|
| tx-1| ccc|
| tx-4| ccc|
+--------------+----------+
如果你想TRANSACTION_ID只有这样,你可以使用
val df4 =df.groupBy(col("Transaction_ID")).count().filter(col("count") >= 2)
df4.show
如果你想同时TRANSACTION_ID和PRODUCT_ID然后
val df1 = df.groupBy(col("Transaction_ID")).count().filter(col("count") >= 2)
val df2 = df.groupBy(col("Transaction_ID")).agg(collect_list(col("Product_ID")) as "Product_ID").withColumn("Product_ID", concat_ws(",", col("Product_ID")))
val df3 = df1.join(df2, df1("Transaction_ID") === df2("Transaction_ID"), "inner").select(df2("Transaction_ID"),df2("Product_ID"))
df3.show
+--------------+----------+
|Transaction_ID|Product_ID|
+--------------+----------+
| tx-1| aaa,ccc|
+--------------+----------+
是什么计数()产生?它是一个filter()方法的对象吗?长不。编译器告诉你到底什么是错的。 –
是的,计数产生一个整数,但我怎样才能“转换”方法返回一个Int? –
您复制粘贴错误的一行:错误是您的文章的前言。请纠正它! – Vale