2016-04-26 76 views
4

我在emr上运行spark工作并使用datastax连接器连接到cassandra群集。我现在面临的问题与番石榴罐子请如下 我使用下面卡桑德拉发现细节的DEP检测到番石榴问题#1635,表明正在使用的番石榴少于16.01版本

cqlsh 5.0.1 | Cassandra 3.0.1 | CQL spec 3.3.1 

EMR上运行4.4火花的工作与下面的Maven的DEP

org.apache.spark 火花-streaming_2.10 1.5.0

<dependency> 
    <groupId>org.apache.spark</groupId> 
    <artifactId>spark-core_2.10</artifactId> 
    <version>1.5.0</version> 
</dependency> 

<dependency> 
    <groupId>org.apache.spark</groupId><dependency> 
    <groupId>com.datastax.spark</groupId> 
    <artifactId>spark-cassandra-connector_2.10</artifactId> 
    <version>1.5.0</version> 
</dependency> 

    <artifactId>spark-streaming-kinesis-asl_2.10</artifactId> 
    <version>1.5.0</version> 
</dependency> 

面临的问题时,我提交的火花工作如下

ava.lang.ExceptionInInitializerError 
     at com.datastax.spark.connector.cql.DefaultConnectionFactory$.clusterBuilder(CassandraConnectionFactory.scala:35) 
     at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:87) 
     at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:153) 
     at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) 
     at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) 
     at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31) 
     at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56) 
     at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81) 
     at ampush.event.process.core.CassandraServiceManagerImpl.getAdMetaInfo(CassandraServiceManagerImpl.java:158) 
     at ampush.event.config.metric.processor.ScheduledEventAggregator$4.call(ScheduledEventAggregator.java:308) 
     at ampush.event.config.metric.processor.ScheduledEventAggregator$4.call(ScheduledEventAggregator.java:290) 
     at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222) 
     at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222) 
     at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902) 
     at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902) 
     at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) 
     at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) 
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
     at org.apache.spark.scheduler.Task.run(Task.scala:88) 
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.IllegalStateException: Detected Guava issue #1635 which indicates that a version of Guava less than 16.01 is in use. This introduces codec resolution issues and potentially other incompatibility issues in the driver. Please upgrade to Guava 16.01 or later. 
     at com.datastax.driver.core.SanityChecks.checkGuava(SanityChecks.java:62) 
     at com.datastax.driver.core.SanityChecks.check(SanityChecks.java:36) 
     at com.datastax.driver.core.Cluster.<clinit>(Cluster.java:67) 
     ... 23 more 

请让我知道如何管理番石榴在这里?

感谢

+0

您的依赖块不完整 –

回答

1

在你的POM的<dependencies>块的东西只要加入这样的:

<dependency> 
    <groupId>com.google.guava</groupId> 
    <artifactId>guava</artifactId> 
    <version>19.0</version> 
</dependency> 

(或任何版本> 16.0.1您喜欢)

+0

我正在浏览链接https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/HnTsWJkI5jo,其中说Spark 1.5使用番石榴14,卡桑德拉驱动核心需要番石榴16.火花卡桑德拉connetor上升例外。所以如何在上面添加可以解决我的问题可能是一个新手问题。谢谢 –

+0

也按照链接https://github.com/datastax/spark-cassandra-connector我使用1.5 cassandra连接器\t 1.5,1.6(spak)\t 3.0(cassandra)?不知道为什么我得到问题 –

+0

不知道你在问什么。如果你有兴趣知道为什么Maven解决了旧版本的番石榴,你可以使用'mvn dependency:tree'它向你展示如何解决(或忽略)每个依赖关系是如何传递的 –

2

我有同样的问题,并通过使用maven Shade插件来解决它,以遮蔽Cassandra连接器带来的番石榴版本。

我需要排除可选的,因为我遇到了Spark尝试从非阴影Guava Present类型转换为阴影可选类型的问题,所以显示了Present和Absent类。我不确定这是否会在以后引发任何问题,但现在似乎对我有用。

你可以在你的pom.xml添加这个到<plugins>部分:

<plugin> 
    <groupId>org.apache.maven.plugins</groupId> 
    <artifactId>maven-shade-plugin</artifactId> 
    <version>2.4.3</version> 
    <executions> 
     <execution> 
      <phase>package</phase> 
      <goals> 
       <goal> 
        shade 
       </goal> 
      </goals> 
     </execution> 
    </executions> 

    <configuration> 
     <minimizeJar>true</minimizeJar> 
     <shadedArtifactAttached>true</shadedArtifactAttached> 
     <shadedClassifierName>fat</shadedClassifierName> 

     <relocations> 
      <relocation> 
       <pattern>com.google</pattern> 
       <shadedPattern>shaded.guava</shadedPattern> 
       <includes> 
        <include>com.google.**</include> 
       </includes> 

       <excludes> 
        <exclude>com.google.common.base.Optional</exclude> 
        <exclude>com.google.common.base.Absent</exclude> 
        <exclude>com.google.common.base.Present</exclude> 
       </excludes> 
      </relocation> 
     </relocations> 

     <filters> 
      <filter> 
       <artifact>*:*</artifact> 
       <excludes> 
        <exclude>META-INF/*.SF</exclude> 
        <exclude>META-INF/*.DSA</exclude> 
        <exclude>META-INF/*.RSA</exclude> 
       </excludes> 
      </filter> 
     </filters> 

    </configuration> 
</plugin> 
+1

这不会解决这里的目的。这里的原因是我们的部署平台EMR。 emr用于构建spark的默认类路径的方式是在classpath中使用少于16个guava的版本,这是因为它会使用EMR 4.2 \ 4.4 \ 4.6中的旧版本的hadoop库。我已经通过向emr添加引导过程来修复我的默认spark类路径和更新的路径。 –

+0

我确认这确实解决了Spark Standalone v1.5.2群集和Spark Cassandra连接器v1.5.1中的问题。谢谢。 –

5

另一种解决方案,进入目录

火花/瓶

。重命名guava-14.0.1.jar然后复制guava-19.0.jar喜欢这幅画:

enter image description here

+2

作为一个说明,番石榴20不会为此工作。不过,番石榴19确实有效。 –

+0

这么棒! –

0

我能够通过外部添加的番石榴16.0.1罐子,然后在指定的星火类路径有以下配置值帮助提交来解决这个问题:

--conf “spark.driver.extraClassPath = /番石榴16.0.1.jar” --conf “spark.executor.extraClassPath = /番石榴16.0.1.jar”

希望这可以帮助有人与s类似的错误!

0

感谢阿德里安您的回应。

我在一个不同于其他人的体系结构上的线程,但番石榴问题仍然是一样的。我正在使用火星2.2与中间层。在我们的开发环境中,我们使用sbt-native-packager来生成我们的码头图像以传递给mesos。

原来,我们需要为火花提交执行程序提供不同的番石榴,而不是我们需要的驱动程序上运行的代码。这对我有效。

build.sbt

.... 
libraryDependencies ++= Seq(
    "com.google.guava" % "guava" % "19.0" force(), 
    "org.apache.hadoop" % "hadoop-aws" % "2.7.3" excludeAll (
    ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-common"), //this is for s3a 
    ExclusionRule(organization = "com.google.guava", name= "guava")), 
    "org.apache.spark" %% "spark-core" % "2.1.0" excludeAll (
    ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"), 
    ExclusionRule(organization = "com.google.guava", name= "guava")) , 
    "com.github.scopt" %% "scopt" % "3.7.0" excludeAll (
    ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"), 
    ExclusionRule(organization = "com.google.guava", name= "guava")) , 
    "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.6", 
... 
dockerCommands ++= Seq(
... 
    Cmd("RUN rm /opt/spark/dist/jars/guava-14.0.1.jar"), 
    Cmd("RUN wget -q http://central.maven.org/maven2/com/google/guava/guava/23.0/guava-23.0.jar -O /opt/spark/dist/jars/guava-23.0.jar") 
... 

当我试图与番石榴16.0.1或19执行人更换番石榴14,但它仍然是行不通的。星火提交刚刚死亡。我的胖罐子实际上是用于我的驱动程序中的番石榴,我被迫成为19,但我的火花提交执行者,我不得不更换为23我试着替换到16和19,但火花刚刚死亡也是。

抱歉转移,但每次我所有的谷歌搜索后,每一次都出现了。我希望这可以帮助其他SBT/mesos人。