2017-03-01 60 views
2

我试图将XML文件加载到我的配置单元表中。以下是我的配置单表查询。将XML数据加载到配置单元表中时出错

CREATE TABLE MYDATA(NAME STRING, AGE INT, SEX STRING) 
    ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
    WITH SERDEPROPERTIES(
    "column.xpath.NAME"="/TAG/NAME/text()", 
    "column.xpath.AGE"="/TAG/AGE/int()", 
    "column.xpath.SEX"="/TAG/SEX/text()") 
    STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
    LOCATION '/home/sid/hivexmltab' 
    TBLPROPERTIES("xmlinput.start"="<TAG","xmlinput.end"="</TAG>"); 

我的输入文件是在下面的格式:

<TAG> 
<NAME>ABCD</NAME><AGE>25</AGE><SEX>male</SEX> 
<NAME>EFGH</NAME><AGE>23</AGE><SEX>female</SEX> 
</TAG> 

我想看到的输出象下面这样:

ABCD,25,male 
EFGH,23,female 

但是我得到的输出象下面这样:

<string>ABCDEFGH</string> NULL <string>malefemale</string> 

我使用jar文件:hivex mlserde-1.0.5.3.jar for Xml SerDe

谁能告诉我什么是我在这里做的错误? 任何帮助表示赞赏。

回答

1

这是一个糟糕的XML结构...
<NAME>...</NAME><AGE>...</AGE><SEX>...</SEX>的任何组合应该被一个额外的标签包装。


CREATE EXTERNAL TABLE MYDATA 
(
    NAME array<string> 
    ,AGE  array<int> 
    ,SEX  array<string>  
) 
    ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
    WITH SERDEPROPERTIES 
    (
     "column.xpath.NAME" = "TAG/NAME/text()" 
     ,"column.xpath.AGE" = "TAG/AGE/text()" 
     ,"column.xpath.SEX" = "TAG/SEX/text()" 
    ) 
    STORED AS 
    INPUTFORMAT  'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
    LOCATION  '/home/sid/hivexmltab' 
    TBLPROPERTIES 
    (
     "xmlinput.start" = "<TAG" 
     ,"xmlinput.end" = "</TAG>" 
    ) 
; 

select * from MYDATA 
; 

+-----------------+------------+-------------------+ 
|  a.name  | mydata.age | mydata.sex  | 
+-----------------+------------+-------------------+ 
| ["ABCD","EFGH"] | [25,23] | ["male","female"] | 
+-----------------+------------+-------------------+ 

select NAME[pe.n] as name 
     ,AGE [pe.n] as age 
     ,SEX [pe.n] as sex 

from MYDATA m 
     lateral view posexplode (m.NAME) pe as n,x 
; 

+------+-----+--------+ 
| name | age | sex | 
+------+-----+--------+ 
| ABCD | 25 | male | 
| EFGH | 23 | female | 
+------+-----+--------+ 
+0

其工作。真正帮助我们构建适合加载xml文件的表结构。 – Sidhartha

1

使用文本()无处不在,修改年龄部位为:

"column.xpath.AGE"="/TAG/AGE/text()" 

可以在蜂巢表

后来改变数据类型中取出的位置部分从CREATE TABLE:

LOCATION '/home/sid/hivexmltab' 

和而是使用LOAD命令在创建表格后加载所有数据

load data local inpath '/home/sid/hivexmltab/XMLfile.xml' overwrite into table MYDATA; 
相关问题