2014-09-29 75 views
2

您如何看待site中提到的问题4的答案是?Hadoop中的合并器,缩减器和生态系统工程

就是答案对错

问:4

In the standard word count MapReduce algorithm, why might using a combiner reduce theoverall Job running time? 

A. Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster. 
B. Because combinersperform local aggregation of word counts, thereby reducing the number of mappers that need to run. 
C. Because combiners perform local aggregation of word counts, and then transfer that data toreducers without writing the intermediate data to disk. 
D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers. 

Answer:A 

问:3

What happens in a MapReduce job when you set the number of reducers to one? 

A. A single reducer gathers and processes all the output from all the mappers. The output iswritten in as many separate files as there are mappers. 
B. A single reducer gathers andprocesses all the output from all the mappers. The output iswritten to a single file in HDFS. 
C. Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduceruntime provides a default setting for the number of reducers. 
D. Setting the number of reducers to one is invalid, and an exception is thrown. 
Answer:A 

从我的理解答案为t他上述问题

Question 4: D 
Question 3: B 

UPDATE

You have user profile records in your OLTP database,that you want to join with weblogs you have already ingested into HDFS.How will you obtain these user records? 
Options 
A. HDFS commands 
B. Pig load 
C. Sqoop import 
D. Hive 
Answer:B 

和更新的问题,我的答案我doubtfull与ç

编辑

正确答案:Sqoop

+1

+1指出了任何想投资那里的人...... – vefthym 2014-09-29 11:53:36

+0

请参阅更新 – 2014-09-29 11:58:55

回答

5

据我了解,这两个答案都是错误的。

我对Combiner没有太多的工作,但是到处都是我发现它在Mapper的输出上工作。 的回答问题4应该是D

再次根据实践经验,我发现输出文件的数量总是等于Reducer s的数量。所以对问题3的回答应该是B。使用MultipleOutputs时可能并非如此,但这并不常见。

最后,我认为Apache不会说谎MapReduce(异常确实发生:)。这两个问题的答案都可以在他们的wiki page中找到。看一看。

顺便说一句,我喜欢“100%传递保证或您的退款!!!”引用链接你提供;-)

编辑
不知道有关在更新部分的问题,因为我已经在猪& Sqoop知之甚少。但通过在HDFS数据&上创建外部表然后加入,Hive肯定可以实现同样的效果。

UPDATE
后从用户milk3422 &主人,我做了一些搜索和发现,因为另一OLTP数据库参与我的蜂巢作为回答最后一个问题的假设是错误的意见。正确的答案应该是C,因为Sqoop旨在在HDFS和关系数据库之间传输数据。

+2

+1两个答案。许多人应该要求他们退款,我想... – vefthym 2014-09-29 11:52:14

+0

是的,我也选择相同的答案。 – 2014-09-29 11:55:35

+0

对于问题4:在映射减少时,Mapper之后的Combiners将数据发送到Reducer之前。合并器用于执行聚合以最小化发送给Reducer的信息量。 D是正确答案。 对于问题3:您将拥有与减速器一样多的输出文件,因此答案A不正确。答案B是正确的答案。 – milk3422 2014-09-30 18:30:30

0

问题4和3的答案对我来说似乎是正确的。对于问题4而言,当使用组合器时,地图输出保存在第一个处理中的集合n中,然后在满时缓冲器被刷新。为了证明这一点,我将添加此链接:http://wiki.apache.org/hadoop/HadoopMapReduce

这里清楚地说明了为什么组合器会为流程增加速度。

另外我认为Q.3的答案也是正确的,因为一般来说这是基本配置,默认情况下。为了证明我将增加另一个提供信息的链接:https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-7/mapreduce-types

+0

@ hunter30:我不认为4和3是正确的答案becoz.how许多reducer是我们只设置1,所以我们所有的最终输出去那个1 reducer所以一个文件将是输出。即输出文件将等于减速器的数量。和问题4组合器之间完成地图和减少和组合器没有在加快映射程序中的作用 – 2014-10-01 03:45:20

+0

实际上,从一个reducer也可以输出到多个输出文件。看看hadoop API:https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html – hunters30 2014-10-02 03:50:31

+0

此外,组合器减少了对hdfs之间的读写操作map减少phase.some内置的Java例子,像wordcount直接使用相同的类组合和减速器 – hunters30 2014-10-02 06:45:20