2017-07-31 56 views
2

我有任务计算每个列的长度并将消息添加到“errorMsg”列。我能够根据长度筛选记录,但无法在新列中追加消息。使用scala添加其他列的长度作为值的列

例如。 我想找到只对新列“ERRORMSG”消息无效记录

RECORDLENGTH = 4

InputDataFrame-    
+------+ 
| value| 
+------+ 
|Pra | 
|Akshay| 
| Raju| 
|Shakti| 
|xyz | 
+------+ 

OutputDataFrame

+------+------------------------+ 
| value|ErrorMsg    | 
+------+------------------------+ 
|Pra |Less Than total Length 
|Akshay|Greater than total length 
|Shakti|Greater than total length 
|xyx |Less than total length 
+------+------------------------- 

哪里拉朱是我的真实记录它关系到有效记录,而不信息。

回答

1

以下将获得所需的结果。

val df = Seq("Pra", "Akshay", "Raju", "Shakti", "xyz").toDF("value") 
df 
.filter(not(length($"value") === 4)) 
.withColumn("ErrorMsg", when(length($"value") > lit(4), "Greater than total length").otherwise("Less Than total Length")) 
.show(10000, false) 

+------+-------------------------+ 
|value |ErrorMsg     | 
+------+-------------------------+ 
|Pra |Less Than total Length | 
|Akshay|Greater than total length| 
|Shakti|Greater than total length| 
|xyz |Less Than total Length | 
+------+-------------------------+ 
相关问题