2017-08-15 51 views
1

我试着一列从字符串转换与此代码与pyspark.sql.functions UNIX_TIMESTAMP得到空

from pyspark.sql.functions import unix_timestamp 
(sc 
.parallelize([Row(dt='2017-01-23T08:12:39.929+01:00')]) 
.toDF() 
.withColumn("parsed", unix_timestamp("dt", "yyyy-MM-ddThh:mm:ss") 
.cast("double") 
.cast("timestamp")) 
.show(1, False)) 

为timestamp,但我得到空

+-----------------------------+------+ 
|dt       |parsed| 
+-----------------------------+------+ 
|2017-01-23T08:12:39.929+01:00|null | 
+-----------------------------+------+ 

为什么呢?

回答

0

您得到NULL,因为您使用的格式与数据不匹配。要获得一个最小的比赛,你必须逃离T用单引号:

yyyy-MM-dd'T'kk:mm:ss 

和相匹配的完整模式,你需要为S毫秒和X的时区:

但在当前星火版直cast

from pyspark.sql.functions import col 

col("dt").cast("timestamp") 

应该只是罚款:

spark.sql(
    """SELECT CAST("2011-01-23T08:12:39.929+01:00" AS timestamp)""" 
).show(1, False) 
+------------------------------------------------+ 
|CAST(2011-01-23T08:12:39.929+01:00 AS TIMESTAMP)| 
+------------------------------------------------+ 
|2011-01-23 08:12:39.929       | 
+------------------------------------------------+ 

参考:SimpleDateFormat