我使用的是Spark 2,您可以看到以下结果,您的问题与unix_timestamp或Spark版本无关,请检查您的数据。
import org.apache.spark.sql.functions.unix_timestamp
val df2 = sc.parallelize(Seq(
(10, "date", "20170312020200"), (10, "date", "20170312050200"))
).toDF("id ", "somthing ", "datee")
df2.show()
val df3=df2.withColumn("datee", unix_timestamp($"datee", "yyyyMMddHHmmss").cast("timestamp"))
df3.show()
+---+---------+--------------+
|id |somthing | datee|
+---+---------+--------------+
| 10| date|20170312020200|
| 10| date|20170312050200|
+---+---------+--------------+
+---+---------+-------------------+
|id |somthing | datee|
+---+---------+-------------------+
| 10| date|2017-03-12 02:02:00|
| 10| date|2017-03-12 05:02:00|
+---+---------+-------------------+
import org.apache.spark.sql.functions.unix_timestamp
df2: org.apache.spark.sql.DataFrame = [id : int, somthing : string ... 1 more field]
df3: org.apache.spark.sql.DataFrame = [id : int, somthing : string ... 1 more field]
这很有道理,我的本地系统在格林威治标准时间+5.30和服务器在EDT。 – Himanshu