Spark，在Scala中添加具有相同值的新列

Spark-Scala环境中的withColumn函数存在一些问题。我想在我的数据框添加新列这样的：Spark，在Scala中添加具有相同值的新列

+---+----+---+ 
| A| B| C| 
+---+----+---+ 
| 4|blah| 2| 
| 2| | 3| 
| 56| foo| 3| 
|100|null| 5| 
+---+----+---+

变成了：

+---+----+---+-----+ 
| A| B| C| D | 
+---+----+---+-----+ 
| 4|blah| 2| 750| 
| 2| | 3| 750| 
| 56| foo| 3| 750| 
|100|null| 5| 750| 
+---+----+---+-----+

在一个值的列d重复N次在我的数据帧的每一行。

的代码是这样的：

var totVehicles : Double = df_totVehicles(0).getDouble(0); //return 750

变量totVehicles返回正确的值，它的作品！

第二数据帧必须计算2个字段（id_zipcode，n_vehicles），并添加第三列（具有相同的值-750）：

var df_nVehicles = 
df_carPark.filter(
     substring($"id_time",1,4) < 2013 
    ).groupBy(
     $"id_zipcode" 
    ).agg(
     sum($"n_vehicles") as 'n_vehicles 
    ).select(
     $"id_zipcode" as 'id_zipcode, 
     'n_vehicles 
    ).orderBy(
     'id_zipcode, 
     'n_vehicles 
    );

最后，我与withColumn函数添加新列：

var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))

但是星火返回我这个错误：

error: value withColumn is not a member of Unit 
     var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))

你可以帮我吗？非常感谢！

来源

2016-07-26 Alessandro

3210功能是用于将文字值作为列

import org.apache.spark.sql.functions._ 
df.withColumn("D", lit(750))

来源

2016-07-26 11:12:01

Spark，在Scala中添加具有相同值的新列

回答

相关问题