2
我试图在EMR集群上运行hadoop作业。它正在作为我使用jar-with-dependencies
的Java命令运行。这项工作从Teradata中提取数据,我认为Teradata相关的jar也包含在jar-with-dependencies中。不过,我仍然得到异常:指定AWS EMR自定义jar应用程序中的其他jar
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:171)
我pom
具有以下相关依存关系:
<dependency>
<groupId>teradata</groupId>
<artifactId>terajdbc4</artifactId>
<version>14.10.00.17</version>
</dependency>
<dependency>
<groupId>teradata</groupId>
<artifactId>tdgssconfig</artifactId>
<version>14.10.00.17</version>
</dependency>
我包装完整的水瓶中下:
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<compilerArgument>-Xlint:-deprecation</compilerArgument>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2.1</version>
<configuration>
<descriptors>
</descriptors>
<archive>
<manifest>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
assembly.xml
文件:
<assembly>
<id>aws-emr</id>
<formats>
<format>jar</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<dependencySets>
<dependencySet>
<unpack>false</unpack>
<includes>
</includes>
<scope>runtime</scope>
<outputDirectory>lib</outputDirectory>
</dependencySet>
<dependencySet>
<unpack>true</unpack>
<includes>
<include>${groupId}:${artifactId}</include>
</includes>
</dependencySet>
</dependencySets>
</assembly>
运行EMR命令:
aws emr create-cluster --release-label emr-5.3.1 \
--instance-groups \
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge \
InstanceGroupType=CORE,InstanceCount=5,BidPrice=0.1,InstanceType=m3.xlarge \
--service-role EMR_DefaultRole --log-uri s3://my-bucket/logs \
--applications Name=Hadoop --name TeradataPullerTest \
--ec2-attributes <ec2-attributes> \
--steps Type=CUSTOM_JAR,Name=EventsPuller,Jar=s3://path-to-jar-with-dependencies.jar,\
Args=[com.my.package.EventsPullerMR],ActionOnFailure=TERMINATE_CLUSTER \
--auto-terminate
有没有我可以指定Teradata的罐子在执行的map-reduce任务,使得它们添加到类路径的方法吗?
编辑:我确认缺少的类是打包在jar-with-dependencies中的。
aws-emr$ jar tf target/aws-emr-0.0.1-SNAPSHOT-jar-with-dependencies.jar | grep TeraDriver
com/ncr/teradata/TeraDriver.class
com/teradata/jdbc/TeraDriver.class