AWS Elastic MapReduce下的慢速Hive查询性能

我遇到一个奇怪的问题，我向你保证我的搜索引擎很多。AWS Elastic MapReduce下的慢速Hive查询性能

我正在运行一组AWS Elastic MapReduce集群，并且我有一个包含大约16个分区的Hive表。它们是由emr-s3distcp创建的（因为原始s3存储桶中有大约216K个文件），使用--groupBy并将限制设置为64MiB（在这种情况下为DFS块大小），它们仅仅是文本文件每行使用JSON SerDe的json对象。

当我运行这个脚本时，它需要很长时间，然后由于某些IPC连接而放弃。

最初，从s3distcp到HDFS的压力非常高，我采取了一些措施（请阅读：调整大容量机器的大小，然后将dfs权限设置为3倍复制，因为它是一个小群集，大小设置为64MiB）。这是有效的，并且不足重复的块的数量变为零（EMR中的默认值小于3是2，但我已经更改为3）。

看着/mnt/var/log/apps/hive_081.log产量seveeral线路是这样的：

2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(222)) - The ping interval is60000ms. 
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(265)) - Use SIMPLE authentication for protocol ClientProtocol 
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:setupIOstreams(551)) - Connecting to /10.17.17.243:9000 
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:sendParam(769)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop sending #14 
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(742)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: starting, having connections 2 
2013-05-12 09:56:12,125 DEBUG org.apache.hadoop.ipc.Client (Client.java:receiveResponse(804)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop got value #14 
2013-05-12 09:56:12,126 DEBUG org.apache.hadoop.ipc.RPC (RPC.java:invoke(228)) - Call: getFileInfo 6 
2013-05-12 09:56:21,523 INFO org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 6 time(s). 
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:close(876)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: closed 
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(752)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: stopped, remaining connections 1 
2013-05-12 09:56:42,544 INFO org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 7 time(s).

等等进行，直到客户的一个击中的限制。

在Elastic MapReduce下修复这个问题需要什么？

感谢

来源

2013-05-12 aldrinleal

一段时间后，我发现：有问题的IP地址是不是即使在我的集群，所以这是一个卡住蜂巢metastore。我已经通过以下方式解决了这个问题：

CREATE TABLE whatever_2 LIKE whatever LOCATION <hdfs_location>; 

ALTER TABLE whetever_2 RECOVER PARTITIONS;

希望它有帮助。

来源

2013-05-12 10:55:38 aldrinleal

AWS Elastic MapReduce下的慢速Hive查询性能

回答

相关问题