2016-07-25 62 views

回答

3

是的。
分区是你把数据分成HDFS上的目录数量。每个目录都是一个分区。例如,如果你的表定义是像

CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING) 
COMMENT 'A bucketed copy of user_info' 
PARTITIONED BY(ds STRING) 
CLUSTERED BY(user_id) INTO 256 BUCKETS; 

那么你就必须对HDFS目录,如

/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/ 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/ 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-13/ 

桶装是关于你的数据是如何分区里面分布,因此,您所拥有的文件像

/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_0 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_1 
... 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_255 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_0 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_1 
... 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_255 

参考HDFS: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables http://www.hadooptpoint.com/hive-buckets-optimization-techniques/

0

你可以!在这种情况下,您将在分区数据中使用桶!

1

是的。这是直截了当的。
尝试下面的东西:

CREATE TABLE IF NOT EXISTS employee_partition_bucket 
( 
employeeID Int, 
firstName String, 
designation String, 
salary Int 
) 
PARTITIONED BY (department string) 
CLUSTERED BY (designation) INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'; 

在这个例子中,我通过指定
Hopw创建的分区由部门和铲斗这将帮助你

+0

如何数据将在文件系统目录分配?你能详细说明吗? – Farooque