2017-10-18 79 views
0

如何使用spark将数据从Oracle数据库导入到dataframe或rdd,然后将此数据写入一些配置单元表?使用Spark从Oracle到配置单元的Tranfser数据使用Spark

我有相同的代码:

public static void main(String[] args) { 

    SparkConf conf = new SparkConf().setAppName("Data transfer test (Oracle -> Hive)").setMaster("local"); 
    JavaSparkContext sc = new JavaSparkContext(conf); 
    SQLContext sqlContext = new SQLContext(sc); 

    HashMap<String, String> options = new HashMap<>(); 
    options.put("url", "jdbc:oracle:thin:@<ip>:<port>:orcl"); 
    options.put("dbtable", "ACCOUNTS"); 
    options.put("user", "username"); 
    options.put("password", "12345"); 
    options.put("driver", "oracle.jdbc.OracleDriver"); 
    options.put("numPartitions", "4"); 

    DataFrame oracleDataFrame = sqlContext.read() 
       .format("jdbc") 
       .options(options) 
       .load(); 

} 

如果我创建HiveContext的情况下使用蜂巢

HiveContext hiveContext = new HiveContext(sc); 

我得到了同样的错误:

ERROR conf.Configuration: Failed to set setXIncludeAware(true) for parser [email protected]:java.lang                      .UnsupportedOperationException: setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBuilderFacto                      ry 
java.lang.UnsupportedOperationException: setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBui                      lderFactory 
     at javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(DocumentBuilderFactory.java:614) 
     at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2534) 
     at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2503) 
     at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2409) 
     at org.apache.hadoop.conf.Configuration.set(Configuration.java:1144) 
     at org.apache.hadoop.conf.Configuration.set(Configuration.java:1116) 
     at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:525) 
     at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:543) 
     at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:437) 
     at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:2750) 
     at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:2713) 
     at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:185) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
     at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249) 
     at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:329) 
     at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:239) 
     at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:443) 
     at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) 
     at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) 
     at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
     at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
     at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) 
     at scala.collection.AbstractIterable.foreach(Iterable.scala:54) 
     at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:271) 
     at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90) 
     at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101) 
     at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:103) 
     at replicator.ImportFromOracleToHive.init(ImportFromOracleToHive.java:52) 
     at replicator.ImportFromOracleToHive.main(ImportFromOracleToHive.java:76) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 

回答

0

问题将似乎是过时的Xerces依赖项的问题,如detailed in this question。我猜想你已经以某种方式在过渡性方面拉动了这一点,但是如果没有看到你的pom.xml就不可能分辨出来。您会从堆栈跟踪中注意到您发布的错误来自Hadoop-Common Configuration对象,而不是Spark本身。解决方案是确保您使用的是足够新的版本。

<dependency> 
    <groupId>xerces</groupId> 
    <artifactId>xercesImpl</artifactId> 
    <version>2.11.0</version> 
</dependency> 
相关问题