在蜂巢

提取结构数组我在蜂房在蜂巢

CREATE EXTERNAL TABLE FOO ( 
    TS string, 
    customerId string, 
    products array< struct <productCategory:string, productId:string> > 
) 
PARTITIONED BY (ds string) 
ROW FORMAT SERDE 'some.serde' 
WITH SERDEPROPERTIES ('error.ignore'='true') 
LOCATION 'some_locations' 
;

表的记录外部表可以容纳的数据包括：

1340321132000, 'some_company', [{"productCategory":"footwear","productId":"nik3756"},{"productCategory":"eyewear","productId":"oak2449"}]

不要任何人知道，如果有一种方法简单地从该记录中提取所有productCategory，并将其作为productCategories数组返回，而不使用爆炸。像下面这样：

["footwear", "eyewear"]

或者我需要写我自己GenericUDF，如果是这样，我不知道太多的Java（Ruby的人），能有人给我一些提示？我从Apache Hive阅读了关于UDF的一些说明。但是，我不知道哪个集合类型最适合处理数组，以及要处理结构的集合类型是什么？

===

我有所写一个GenericUDF回答了这个问题，但我遇到了其他2个问题。它是在这个SO Question

来源

2013-03-26 pchu

如果数组的大小是固定的（如2）。请尝试：

products[0].productCategory,products[1].productCategory

但是，如果不是，UDF应该是正确的解决方案。我想你可以在JRuby中做到这一点。 GL！

来源

2013-03-26 07:20:21 www

谢谢，但数组的大小是不固定的。虽然使用JRuby的好主意，为此，需要使用Java来编写GenericUDF。更糟的是，在编写GenericUDF时没有太多参考。 – pchu 2013-03-26 12:23:14

一种方法是使用要么inline或explode功能，像这样：

SELECT 
    TS, 
    customerId, 
    pCat, 
    pId, 
FROM FOO 
LATERAL VIEW inline(products) p AS pCat, pId

否则，你可以写UDF。请查看this post和this post。随着以下资源：

来源

2016-02-29 01:39:24 chorbs

您可以使用JSON SERDE或内置的功能get_json_object，json_tuple。

随着rcongiu's Hive-JSON SerDe的使用将是：

定义表：

CREATE TABLE complex_json (
DocId string, 
Orders array<struct<ItemId:int, OrderDate:string>>)

负载样品JSON到它（这是重要的这个数据是一个衬里）：

{"DocId":"ABC","Orders":[{"ItemId":1111,"OrderDate":"11/11/2012"},{"ItemId":2222,"OrderDate":"12/12/2012"}]}

然后提取订单ID就像：

SELECT Orders.ItemId FROM complex_json LIMIT 100;

它将返回ID的列表供您：

为itemid [1111,2222]

证明这对我的环境中返回正确的结果。全面上市：

add jar hdfs:///tmp/json-serde-1.3.6.jar; 

CREATE TABLE complex_json (
    DocId string, 
    Orders array<struct<ItemId:int, OrderDate:string>> 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; 

LOAD DATA INPATH '/tmp/test.json' OVERWRITE INTO TABLE complex_json; 

SELECT Orders.ItemId FROM complex_json LIMIT 100;

来源

2016-02-29 16:04:03 Viktor

回答

相关问题