Hive：需要指定分区列，因为目标表已分区

我想知道是否可以在Hive中将非分区表插入为分区的分区表中。第一个表如下：Hive：需要指定分区列，因为目标表已分区

hive> describe extended user_ratings; 
OK 
userid     int           
movieid     int           
rating     int           
unixtime    int           

Detailed Table Information Table(tableName:user_ratings, dbName:ml, owner:cloudera, createTime:1500142667, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/ml.db/user_ratings, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim= 
Time taken: 0.418 seconds, Fetched: 6 row(s)

新表是这样的：

hive> describe extended rating_buckets; 
OK 
userid     int           
movieid     int           
rating     int           
unixtime    int           
genre     string          

# Partition Information  
# col_name    data_type    comment    

genre     string          

Detailed Table Information Table(tableName:rating_buckets, dbName:default, owner:cloudera, createTime:1500506879, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null), FieldSchema(name:genre, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/rating_buckets, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:8, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim= 
Time taken: 0.46 seconds, Fetched: 12 row(s)

这似乎是计数分区（“流派”）为同其他列...我是否可能造成错误？

不管怎么说，这里是当我试图做一个INSERT OVERWRITE到新表会发生什么：

hive> FROM ml.user_ratings 
    > INSERT OVERWRITE TABLE rating_buckets 
    > select userid, movieid, rating, unixtime; 
FAILED: SemanticException 2:23 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'rating_buckets'

如果我只是重新创建第一个表分区？有没有办法将第一张表复制并保持分区不变？

来源

2017-07-19 lengthy_preamble

你甚至没有在你的选择列表中包括流派。我认为它需要在你选择的最后。你无法分配任何东西。

您还需要与表指定分区，就像这样：

insert overwrite table ratings_buckets partition(genre) 
select 
userid, 
movieid, 
rating, 
unixtime, 
<SOMETHING> as genre 
from 
...

来源

2017-07-20 03:54:47 Andrew

我很欣赏你的输入，但不幸的是，它返回如下：蜂巢>插入覆盖表rating_buckets分区（流派） > select > userid， > movieid， > rating， > unixtime， >（action）as genre > from ml.user_ratings; FAILED：SemanticException [错误10004]：第7行：1无效的表别名或列引用'action':(可能的列名是：userid，movieid，rating，unixtime） –

您是否试图将单词action插入为流派？如果是这样，你需要用单引号括起来，而不是parens：''action'as genre'。 – Andrew

这样做，谢谢！ –

Hive：需要指定分区列，因为目标表已分区

回答

相关问题