我想读取一个示例json文件到SqlContext使用下面的代码,但它失败,随后的数据源错误。阅读json的火花缺少json datasource
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val path = "C:\\samplepath\\sample.json"
val jsondata = sqlContext.read.json(path)
抛出java.lang.ClassNotFoundException:无法找到数据源:JSON。 请http://spark-packages.org 发现包在org.apache.spark.sql.execution.datasources.ResolvedDataSource $ .lookupDataSource(ResolvedDataSource.scala:77) 在org.apache.spark.sql.execution.datasources.ResolvedDataSource $。适用( ResolvedDataSource.scala:102) 在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) 在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) 在有机apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:244) at org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121 ) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)导致:java.lang.ClassNotFoundException:json.DefaultSource at scala.tools.nsc.interpreter.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:83 ) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4 $$ anonfun $ apply $ 1.apply(ResolvedDataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4 $$ anonfun $ apply $ 1.apply(ResolvedDataSource.scala:62 ) at scala.util.Try $ .apply(Try.scala:161) at org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4.apply(Resolved DataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4.apply(ResolvedDataSource.scala:62) at scala.util.Try.orElse(Try.scala:82) 在org.apache.spark.sql.execution.datasources.ResolvedDataSource $ .lookupDataSource(ResolvedDataSource.scala:62) ...... 50多个
我试图寻找一个火花包可能会丢失,但找不到任何有助于解决问题的方法。
我试过类似的代码使用Pyspark,但它失败了一个类似的json数据源ClassNotFoundException。
在进一步尝试将现有的RDD转换为JsonRDD后,我能够成功获取结果。有什么我失踪?我在Scala-2.10.5上使用Spark-1.6.1。任何帮助表示赞赏。由于
val stringRDD = sc.parallelize(Seq("""
{ "isActive": false,
"balance": "$1,431.73",
"picture": "http://placehold.it/32x32",
"age": 35,
"eyeColor": "blue"
}""",
"""{
"isActive": true,
"balance": "$2,515.60",
"picture": "http://placehold.it/32x32",
"age": 34,
"eyeColor": "blue"
}""",
"""{
"isActive": false,
"balance": "$3,765.29",
"picture": "http://placehold.it/32x32",
"age": 26,
"eyeColor": "blue"
}""")
)
sqlContext.jsonRDD(stringRDD).registerTempTable("testjson")
sqlContext.sql("SELECT age from testjson").collect