2013-05-12 64 views
0

我遇到一个奇怪的问题,我向你保证我的搜索引擎很多。AWS Elastic MapReduce下的慢速Hive查询性能

我正在运行一组AWS Elastic MapReduce集群,并且我有一个包含大约16个分区的Hive表。它们是由emr-s3distcp创建的(因为原始s3存储桶中有大约216K个文件),使用--groupBy并将限制设置为64MiB(在这种情况下为DFS块大小),它们仅仅是文本文件每行使用JSON SerDe的json对象。

当我运行这个脚本时,它需要很长时间,然后由于某些IPC连接而放弃。

最初,从s3distcp到HDFS的压力非常高,我采取了一些措施(请阅读:调整大容量机器的大小,然后将dfs权限设置为3倍复制,因为它是一个小群集,大小设置为64MiB)。这是有效的,并且不足重复的块的数量变为零(EMR中的默认值小于3是2,但我已经更改为3)。

看着/mnt/var/log/apps/hive_081.log产量seveeral线路是这样的:

2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(222)) - The ping interval is60000ms. 
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(265)) - Use SIMPLE authentication for protocol ClientProtocol 
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:setupIOstreams(551)) - Connecting to /10.17.17.243:9000 
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:sendParam(769)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop sending #14 
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(742)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: starting, having connections 2 
2013-05-12 09:56:12,125 DEBUG org.apache.hadoop.ipc.Client (Client.java:receiveResponse(804)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop got value #14 
2013-05-12 09:56:12,126 DEBUG org.apache.hadoop.ipc.RPC (RPC.java:invoke(228)) - Call: getFileInfo 6 
2013-05-12 09:56:21,523 INFO org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 6 time(s). 
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:close(876)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: closed 
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(752)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: stopped, remaining connections 1 
2013-05-12 09:56:42,544 INFO org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 7 time(s). 

等等进行,直到客户的一个击中的限制。

在Elastic MapReduce下修复这个问题需要什么?

感谢

回答

0

一段时间后,我发现:有问题的IP地址是不是即使在我的集群,所以这是一个卡住蜂巢metastore。我已经通过以下方式解决了这个问题:

CREATE TABLE whatever_2 LIKE whatever LOCATION <hdfs_location>; 

ALTER TABLE whetever_2 RECOVER PARTITIONS; 

希望它有帮助。