2016-06-21 71 views
0

我在斯卡拉工作和火花火花阶定义输入参数的UDF

我定义一个UDF,这里是

def udfcrpentcd = udf((CORP_ENT_CD:String)=>{ 
    state_name match{ 
     case "IL1" if state_name.contains("IL1")=> "IL1" 
     case "OK1" if state_name.contains("OK1")=> "OK1" 
     case "TX1" if state_name.contains("TX1")=> "TX1" 
     case "NM1" if state_name.contains("NM1")=> "NM1" 
     case "MT1" if state_name.contains("MT1")=> "MT1" 
     case _ =>"Null" 
    }}) 




val local_masterdb =old_dataframe_temp_masterdbDataFrame.withColumn("new_columna_name_CORP_ENT_CD",udfcrpentcd(old_dataframe_temp_masterdbDataFrame("last_column_of_old_dataframe_DB_STATUS")+1)) 
    local_masterdb.show() 

现在,我想重用上面的UDF,

我想使它通用,而不是比较state_name,我需要传递一个字符串,然后它返回CRP_ENT_CD ...这就是我想要做的。

这是正确的方式....

def udfcrpentcd (input_parameter:String) = udf((CORP_ENT_CD:String)=>{ 
    input_parameter match{ 
     case "IL1" if input_parameter.contains("IL1")=> "IL1" 
     case "OK1" if input_parameter.contains("OK1")=> "OK1" 
     case "TX1" if input_parameter.contains("TX1")=> "TX1" 
     case "NM1" if input_parameter.contains("NM1")=> "NM1" 
     case "MT1" if input_parameter.contains("MT1")=> "MT1" 
     case _ =>"Null" 
    }}) 

如果这是正确的方式,然后如何将它打回去? anyhelp关于传递参数

回答

1

下面是如何将参数传递给udf的示例。

val udfcrpentcd_res = udf(udfcrpentcd) 
def udfcrpentcd (String => String) = (input_parameter: String) =>{ 
input_parameter match{ 
    case "IL1" if input_parameter.contains("IL1")=> "IL1" 
    case "OK1" if input_parameter.contains("OK1")=> "OK1" 
    case "TX1" if input_parameter.contains("TX1")=> "TX1" 
    case "NM1" if input_parameter.contains("NM1")=> "NM1" 
    case "MT1" if input_parameter.contains("MT1")=> "MT1" 
    case _ =>"Null" 
}}) 

val local_masterdb = old_dataframe_temp_masterdbDataFrame.withColumn("new_columna_name_CORP_ENT_CD",udfcrpentcd_res(old_dataframe_temp_masterdbDataFrame("last_column_of_old_dataframe_DB_STATUS")+1)) 
local_masterdb.show()