2012-07-17 76 views
1

我有一个包含数据如下:使用JSON数据运行Hive查询时出错?

{"field1":{"data1": 1},"field2":100,"field3":"more data1","field4":123.001} 
{"field1":{"data2": 1},"field2":200,"field3":"more data2","field4":123.002} 
{"field1":{"data3": 1},"field2":300,"field3":"more data3","field4":123.003} 
{"field1":{"data4": 1},"field2":400,"field3":"more data4","field4":123.004} 

我上传到S3以及使用来自所述配置单元控制台下面它转换为一个蜂房表:

ADD JAR s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar; 
CREATE EXTERNAL TABLE impressions (json STRING) ROW FORMAT DELIMITED LINES TERMINATED BY '\n' LOCATION 's3://my-bucket/'; 

查询:

SELECT * FROM impressions; 

给出的输出很好,但只要我尝试并使用get_json_object UDF

和运行查询:

SELECT get_json_object(impressions.json, '$.field2') FROM impressions; 

我得到以下错误:

> SELECT get_json_object(impressions.json, '$.field2') FROM impressions; 
Total MapReduce jobs = 1 
Launching Job 1 out of 1 
Number of reduce tasks is set to 0 since there's no reduce operator 
java.io.IOException: cannot find dir = s3://nick.bucket.dev/snapshot.csv in pathToPartitionInfo: [s3://nick.bucket.dev/] 
    at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:291) 
    at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:258) 
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:108) 
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:423) 
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1036) 
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1028) 
    at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:897) 
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:871) 
    at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:479) 
    at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136) 
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133) 
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) 
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332) 
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123) 
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) 
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:261) 
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:218) 
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409) 
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) 
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:567) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
    at java.lang.reflect.Method.invoke(Method.java:597) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
Job Submission failed with exception 'java.io.IOException(cannot find dir = s3://my-bucket/snapshot.csv in pathToPartitionInfo: [s3://my-bucket/])' 
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask 

有谁知道什么是错?

回答

1

有什么理由你声明:

ADD JAR s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar; 

但不使用你的表定义的SERDE?有关如何使用它,请参阅下面的代码片段。我看不到任何理由在这里使用get_json_object。

CREATE EXTERNAL TABLE impressions (
    field1 string, field2 string, field3 string, field4 string 
) 
    ROW FORMAT 
    serde 'com.amazon.elasticmapreduce.JsonSerde' 
    with serdeproperties ('paths'='field1, field2, field3, 
    field4) 
    LOCATION 's3://mybucket' ; 
+0

对不起,这是以前的尝试错字。这将工作嵌套的数据虽然,就像我在'field1'的例子? – nickponline 2012-07-17 21:31:13

相关问题