1
我的设置是在AWS中运行的3节点群集。我已经摄入了我的数据(30行),使用jupyter笔记本运行查询时没有任何问题。但是现在我试图使用spark和java来运行查询,如下面的代码片段所示。Geomesa + SparkSQL集成问题
public class SparkSqlTest {
private static final Logger log = Logger.getLogger(SparkSqlTest.class);
public static void main(String[] args) {
Map<String, String> dsParams = new HashMap<>();
dsParams.put("instanceId", "gis");
dsParams.put("zookeepers", "server ip");
dsParams.put("user", "root");
dsParams.put("password", "secret");
dsParams.put("tableName", "posiciones");
try {
DataStoreFinder.getDataStore(dsParams);
SparkConf conf = new SparkConf();
conf.setAppName("testSpark");
conf.setMaster("yarn");
SparkContext sc = SparkContext.getOrCreate(conf);
SparkSession ss = SparkSession.builder().config(conf).getOrCreate();
Dataset<Row> df = ss.read()
.format("geomesa")
.options(dsParams)
.option("geomesa.feature", "posicion")
.load();
df.createOrReplaceTempView("posiciones");
long t1 = System.currentTimeMillis();
Dataset<Row> rows = ss.sql("select count(*) from posiciones where id_equipo = 148 and fecha_hora >= '2015-04-01' and fecha_hora <= '2015-04-30'");
long t2 = System.currentTimeMillis();
rows.show();
log.info("Tiempo de la consulta: " + ((t2 - t1)/1000) + " segundos.");
} catch (Exception e) {
e.printStackTrace();
}
}
}
我上传我的主人EC2框代码(jupyter notebook image内),并使用以下命令运行它:
docker cp myjar-0.1.0.jar jupyter:myjar-0.1.0.jar
docker exec jupyter sh -c '$SPARK_HOME/bin/spark-submit --master yarn --class mypackage.SparkSqlTest file:///myjar-0.1.0.jar --jars $GEOMESA_SPARK_JARS'
但我得到了以下错误:
17/09/15 19:45:01 INFO HSQLDB4AD417742A.ENGINE: dataFileCache open start
17/09/15 19:45:02 INFO execution.SparkSqlParser: Parsing command: posiciones
17/09/15 19:45:02 INFO execution.SparkSqlParser: Parsing command: select count(*) from posiciones where id_equipo = 148 and fecha_hora >= '2015-04-01' and fecha_hora <= '2015-04-30'
java.lang.RuntimeException: Could not find a SpatialRDDProvider
at org.locationtech.geomesa.spark.GeoMesaSpark$$anonfun$apply$2.apply(GeoMesaSpark.scala:33)
at org.locationtech.geomesa.spark.GeoMesaSpark$$anonfun$apply$2.apply(GeoMesaSpark.scala:33)
任何想法为什么发生这种情况?
所有罐子做什么$ GEOMESA_SPARK_JARS包括以下项目?如果它不包含geomesa-accumulo-spark-runtime_2.11 - $ {version} .jar,那么可以解释这个问题。 – GeoMesaJim
哦,另一个建议/问题是检查“DataStoreFinder.getDataStore(dsParams);”的返回值。 如果GeoMesa AccumuloDataStore不在类路径中,那么该行将很高兴地要求'null'。 – GeoMesaJim
这是$ GEOMESA_SPARK_JARS文件的值:///opt/geomesa/dist/spark/geomesa-accumulo-spark-runtime_2.11-1.3.2.jar,file:/// opt/geomesa/dist/spark/geomesa-spark-converter_2.11-1.3.2.jar,file:///opt/geomesa/dist/spark/geomesa-spark-geotools_2.11-1.3.2.jar – jramirez