2017-10-19 149 views
0

我在外部jar中设置了pojos,我想从这些对象中创建Dataset。 如果我从Scala案例类创建数据集,那么我可以根据期望创建数据集。 如果我试图做与JAVA对象相同,它将一列中的所有数据作为一个对象。从斯卡拉JAVA对象创建火花数据集,spark 1.6

case class patientDiagnosis(patientId: Long, visitId: Long, diagnosisCode: String, isPrimaryDiagnosis: String, patientDiagnosisId: Long, sourceSystemUniqueIdentifier: String, diagnosisCodeSystem: String) {} 

println("case Dataset from scala object :") 
joinDf.as[patientDiagnosis].show() 

OUTPUT: 
case Dataset from scala object : 
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+ 
|patientId|visitId|diagnosisCode|isPrimaryDiagnosis|patientDiagnosisId|sourceSystemUniqueIdentifier|diagnosisCodeSystem| 
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+ 
| 1388158|1764555|  296.20|     1|   1247383|      1247383|    ICD9| 
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+ 

当我试图做到这一点在Java中,给出以下的输出:

JAVA Object: 

public class PatientDiagnosis implements Serializable{ 

private static final long serialVersionUID = -7971192387675901350L; 

private long patientId; 
private long visitId; 
private String diagnosisCode; 
private String isPrimaryDiagnosis; 
private long patientDiagnosisId; 
private String sourceSystemUniqueIdentifier; 
private int isDeleted; 
private String diagnosisCodeSystem; 
} 

scala code: 

import sqlContext.implicits._ 
val p:Encoder[com....PatientDiagnosis] = Encoders.bean(classOf[com....PatientDiagnosis]) 
println("case Java Encoder :") 
joinDiagnf.as[com....PatientDiagnosis](p).show(false) 

OUTPUT: 
case Java Encoder : 
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+ 
|diagnosisCode                                                |diagnosisCodeSystem|isDeleted|isPrimaryDiagnosis|patientDiagnosisId|patientId|sourceSystemUniqueIdentifier|visitId| 
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+ 
|PatientDiagnosis [patientId=0, visitId=1764555, diagnosisCode=296.20, isPrimaryDiagnosis=1, patientDiagnosisId=1247383, sourceSystemUniqueIdentifier=1247383, isDeleted=0, diagnosisCodeSystem=ICD9]| 
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+ 

我做任何语法错误或不被支持斯卡拉火花来创建JAVA对象数据集1.6版本。

+0

'joinDiagnf'的模式是什么? –

+0

与每个对象相同 – Kalpesh

回答

0

对不起我的错误,它给出正确的输出。 我以前没有得到这个,因为dataset.show视图没有给出正确的解释。 当我选择特定列时,这些列具有所需的值。