蜂箱 - 分区表

我创建了一个蜂巢表查询 -蜂箱 - 分区表

create table studpart4(id int, name string) partitioned by (course string, year int) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;

创建成功。

下面的命令加载的数据 -

load data local inpath '/scratch/hive_inputs/student_input_1.txt' overwrite into table studpart4 partition(course='cse',year=2);

我的输入数据文件的样子 -

101 student1 cse 1 

102 student2 cse 2 

103 student3 eee 3 

104 student4 eee 4 

105 student5 cse 1 

106 student6 cse 2 

107 student7 eee 3 

108 student8 eee 4 

109 student9 cse 1 

110 student10 cse 2

但是输出显示为（的select * from studpart4） -

101 student1 cse 2 

102 student2 cse 2 

103 student3 eee 2 

104 student4 eee 2 

105 student5 cse 2 

106 student6 cse 2 

107 student7 eee 2 

108 student8 eee 2 

109 student9 cse 2 

110 student10 cse 2

为什么最后一列是2.为什么它被改变和更新错误。

来源

2016-08-20 Suresh J

http://stackoverflow.com/a/13224581/2079249 –

您显示的结果与您告知Hive如何处理您的数据完全相同。

在你的第一个命令，您要创建一个分区表studpart4有两列，id和name，以及两个分区键，course和year（曾经创造，表现得像常规列）。现在，在你的第二个命令，你在做什么是这样的：

load data local inpath '/scratch/hive_inputs/student_input_1.txt' overwrite into table studpart4 partition(course='cse',year=2)

这基本上意味着“副本全部来自student_input_1.txt数据到表studpart4和course列的所有值设置为‘自定义搜索引擎’和列year的所有值为'2'“。在内部，Hive会创建一个包含分区键的目录结构。您的数据将存储在类似这样的目录：

.../studpart4/course=cse/year=2/

我怀疑你真正想要的是蜂巢检测的course和year列值在.txt文件，并为您设置正确的价值观。为了执行该操作，您必须使用表格的dynamic partitioning，并将您的数据的策略按照loading的规则写入外部表格，然后使用INSERT OVERWRITE INTO TABLE命令将数据存储到您的studpart4表格中。 BigDataLearner在评论中发布的链接描述了这种策略。

我希望这会有所帮助。

来源

2016-08-21 15:59:27

非常好。感谢您的详细解释。我现在澄清。 –

不客气:-) –

蜂箱 - 分区表

回答

相关问题