0
我想知道是否可以在Hive中将非分区表插入为分区的分区表中。第一个表如下:Hive:需要指定分区列,因为目标表已分区
hive> describe extended user_ratings;
OK
userid int
movieid int
rating int
unixtime int
Detailed Table Information Table(tableName:user_ratings, dbName:ml, owner:cloudera, createTime:1500142667, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/ml.db/user_ratings, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim=
Time taken: 0.418 seconds, Fetched: 6 row(s)
新表是这样的:
hive> describe extended rating_buckets;
OK
userid int
movieid int
rating int
unixtime int
genre string
# Partition Information
# col_name data_type comment
genre string
Detailed Table Information Table(tableName:rating_buckets, dbName:default, owner:cloudera, createTime:1500506879, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null), FieldSchema(name:genre, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/rating_buckets, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:8, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim=
Time taken: 0.46 seconds, Fetched: 12 row(s)
这似乎是计数分区(“流派”)为同其他列...我是否可能造成错误?
不管怎么说,这里是当我试图做一个INSERT OVERWRITE到新表会发生什么:
hive> FROM ml.user_ratings
> INSERT OVERWRITE TABLE rating_buckets
> select userid, movieid, rating, unixtime;
FAILED: SemanticException 2:23 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'rating_buckets'
如果我只是重新创建第一个表分区?有没有办法将第一张表复制并保持分区不变?
我很欣赏你的输入,但不幸的是,它返回如下: 蜂巢>插入覆盖表rating_buckets分区(流派) > select > userid, > movieid, > rating, > unixtime, >(action)as genre > from ml.user_ratings; FAILED:SemanticException [错误10004]:第7行:1无效的表别名或列引用'action':(可能的列名是:userid,movieid,rating,unixtime) –
您是否试图将单词action插入为流派?如果是这样,你需要用单引号括起来,而不是parens:''action'as genre'。 – Andrew
这样做,谢谢! –