我有一个加载并合并到PySpark中单个数据框中的实木复合格式文件列表。将pyspark转换为scala:读取多个目录
paths = ['file1', 'file2', 'file3']
df_list = map(lambda x:(spark.read.parquet(x)), paths)
df = reduce(lambda df1, df2: df1.unionAll(df2), df_list)
我想用Scala做同样的操作。然而,当我使用的路径
val df_list = map(x = > (spark.read.parquet(x)), paths)
我收到以下错误斯卡拉在地图上的操作:
:139: error: overloaded method value parquet with alternatives: (paths: String*)org.apache.spark.sql.DataFrame
(path: String)org.apache.spark.sql.DataFrame cannot be applied to (List[String]) val df_list = map(x = > (spark.read.parquet(x)), paths)
任何建议来解决问题,将不胜感激。