2017-01-03 182 views
0

我正在使用Hive来解析xml文件,因为我使用的是hivexmlserde。 当我写我的代码并执行它时,我得到错误。在Hive中解析xml时出错

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: The number of XPath expressions does not much the number of columns 

,但我的列数和XPath表达式是相同的。

下面是我的代码:

add jar /home/cloudera/hivexmlserde-1.0.5.3.jar; 
CREATE EXTERNAL TABLE INFO(
statusCode string, 
title string, 
startTime string, 
endTime string, 
frequencyValue string, 
frequencyUnits string, 
strengthValue string, 
strengthUnits string, 
routecode string, 
routecodeSystem string, 
routedisplayName string, 
routecodesystemName string, 
ugcode string, 
uname string, 
ucodeSystem string, 
codeSystemName string, 
ageForm string, 
tr_code string, 
tr_description string, 
tr_codesystem string, 
tr_codesystemname string 
) 
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
WITH SERDEPROPERTIES (
"column.xpath.statusCode"="Document/xxx/statusCode/text()", 
"column.xpath.title"="Document/xxx/code/code/text()", 
"column.xpath.startTime"="Document/xxx/startTime/text()", 
"column.xpath.endTime"="Document/xxx/endTime/text()", 
"column.xpath.frequencyValue"="Document/xxx/frequencyValue/text()", 
"column.xpath.frequencyUnits"="Document/xxx/frequencyUnits/text()", 
"column.xpath.strengthValue"="Document/xxx/strengthValue/text()", 
"column.xpath.strengthUnits"="Document/xxx/strengthUnits/text()", 
"column.xpath.routecode"="Document/xxx/entryInfo/routeCode/code/text()", 
"column.xpath.routecodeSystem"="Document/xxx/entryInfo/routeCode/codeSystem/text()", 
"column.xpath.routedisplayName"="Document/xxx/entryInfo/routeCode/displayName/text()", 
"column.xpath.routecodesystemName"="Document/xxx/entryInfo/routeCode/codeSystemName/text()", 
"column.xpath.ugcode"="Document/xxx/entryInfo/productCode/code/text()", 
"column.xpath.ugname"="Document/xxx/entryInfo/productCode/displayName/text()", 
"column.xpath.ugcodeSystem"="Document/xxx/entryInfo/productCode/codeSystem/text()", 
"column.xpath.ugcodeSystemName"="Document/xxx/entryInfo/productCode/codeSystemName/text()", 
"column.xpath.dosageForm"="Document/xxx/entryInfo/ageForm/displayName/text()", 
"column.xpath.tr_code"="Document/xxx/entryInfo/productCode/translation/code/text()", 
"column.xpath.tr_description"="Document/xxx/entryInfo/productCode/translation/displayName/text()", 
"column.xpath.tr_codesystem"="Document/xxx/entryInfo/productCode/translation/codeSystem/text()", 
"column.xpath.tr_codesystem"="Document/xxx/entryInfo/productCode/translation/codeSystemName/text()" 
) 
STORED AS 
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
TBLPROPERTIES (
"xmlinput.start"="<Document", 
"xmlinput.end"="</Document>"); 

回答

2

我发现问题有点码挖后。我正面临这个问题,因为我做了2个xpath列名。

column.xpath.tr_codesystem

是在SERDEPROPERTIES重复两次。我将它改为codesystemname比它开始为我工作。