2017-07-12 73 views
0

我试图写入一个简单的数据库到oracle数据库,但我收到一条错误消息。我使用一个案例类和一个列表来构造我的数据框。我发现在写入数据到我的oracle数据库后,我们可以使用jdbc方法。 我试过这段代码:我不能写一个Spark DataFrame到数据库与jdbc

case class MyClass(A: String, B: Int) 
val MyClass_List = List(MyClass("att1", 1), MyClass("att2", 2)) 

val MyClass_df = MyClass_List.toDF() 

MyClass_df.write 
      .mode("append") 
      .jdbc(url, tableTest, prop) 

,但我得到了以下错误:

17/07/12 14:57:04 ERROR JobScheduler: Error running job streaming job 1499864218000 ms.0 
java.lang.NullPointerException 
     at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93) 
     at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426) 
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) 
     at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446) 
     at Test$$anonfun$1.apply(Test.scala:177) 
     at Test$$anonfun$1.apply(Test.scala:117) 
     at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627) 
     at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) 
     at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) 
     at scala.util.Try$.apply(Try.scala:192) 
     at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) 
     at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:254) 
     at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254) 
     at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254) 
     at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) 
     at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:748) 
Exception in thread "main" java.lang.NullPointerException 
     at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93) 
     at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426) 
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) 
     at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446) 
     at Test$$anonfun$1.apply(Test.scala:177) 
     at Test$$anonfun$1.apply(Test.scala:117) 
     at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627) 
     at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) 
     at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) 
     at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) 
     at scala.util.Try$.apply(Try.scala:192) 
     at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) 
     at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:254) 
     at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254) 
     at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254) 
     at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) 
     at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:748) 

我使用的火花版本2.1.0和我的两个列A和B分别数据库类型为varchar和数量。

你有什么想法吗?

回答

0

事实上,我使用MySQL的驱动程序,尽管oracle的驱动程序。 我应该使用

prop.setProperty("driver", "oracle.jdbc.driver.OracleDriver") 

,而不是

prop.setProperty("driver", "com.mysql.jdbc.Driver") 
1

它应该是“oracle.jdbc.OracleDriver”作为一个驱动程序包已经过时了。

prop.setProperty("driver", "oracle.jdbc.OracleDriver")