HIVE：在HDFS中分区后创建空桶

用于设置的一些属性：

set hive.enforce.bucketing = true; 
SET hive.exec.dynamic.partition = true; 
SET hive.exec.dynamic.partition.mode = nonstrict;

下面是用于创建表的代码：

CREATE TABLE transactions_production 
(id string, 
dept string, 
category string, 
company string, 
brand string, 
date1 string, 
productsize int, 
productmeasure string, 
purchasequantity int, 
purchaseamount double) 
PARTITIONED BY (chain string) clustered by(id) into 5 buckets 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
STORED AS TEXTFILE;

下面是用于插入数据到表中的代码：

INSERT OVERWRITE TABLE transactions_production PARTITION (chain) 
select id, dept, category, company, brand, date1, productsize, productmeasure, 
purchasequantity, purchaseamount, chain from transactions_staging;

出错了：

分区和分区正在HDFS中创建，但数据仅存在于所有分区的第一个分区中;所有剩余的桶都是空的。

请让我知道我做错了什么，以及如何解决这个问题。

来源

2015-10-15 user182944

当使用bucketing时，Hive提供了一个按值聚簇（在这里使用id）的散列，并将表拆分成分区内的许多平面文件。

由于该表是由id的散列分割的，因此每个拆分的大小都基于表中的值。

如果您没有值映射到第一个存储桶以外的存储桶，则所有这些平面文件都将为空。

来源

2015-10-15 11:26:40 madhu

HIVE：在HDFS中分区后创建空桶

回答

相关问题