如何在R中编写map reduce？

我是R新手。我知道如何在Java中编写map reduce。我想尝试同在河等都可以在给予任何samle代码的，任何一个帮助那里R.任何固定格式那里的MapReduce如何在R中编写map reduce？

请将除此之外的其他任何链接：https://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorial

任何样品代码将会更有帮助。

2012-07-26 Manoj

谷歌搜索'[R] mapreduce'将给出一些有用的链接，这样包：HTTP：// CRAN。 r-project.org/web/packages/mapReduce/index.html和这个博客：http：//www.r-bloggers.com/making-sense-of-mapreduce/ – Andrie 2012-07-26 06:27:45

给那个默默低声的人：这是爱的夏天http://blog.stackoverflow.com/2012/0 7 /踢的夏天的爱/，所以我建议你做一些事情1）解释为什么downvote，2）向OP解释如何改善问题3）编辑问题，所以它是一个很好的问题。 – Andrie 2012-07-26 06:32:16

不是一个down-voter，但在这里。马诺伊，我认为你应该改正你的问题。请添加您尝试过的信息。 “我一直在用Java编写MR，但现在我想在R中尝试一下。我已经阅读了本教程，并做了这个和那个搜索，但对更多教程感兴趣，而这些教程已经逃脱了我。。你也可以做的是收集关于R和MR（如果还不存在）的所有引用的列表，并将此问题作为维基。 – 2012-07-26 06:43:20

当您想要以Java以外的语言实现映射reduce（使用Hadoop）时，则使用称为流的功能。然后通过STDIN（readLines（））将数据提供给映射器，然后通过STDOUT（cat（））返回到Hadoop，然后再通过STDIN（readLines（））返回到reducer，最后通过STDOUT（cat（））脱机。

以下代码取自我编写的使用R for Hadoop的map reduce作业的article。该代码应该计算2克，但我会说足够简单，以了解MapReduce明智之举。

# map.R 

library(stringdist, quietly=TRUE) 

input <- file("stdin", "r") 

while(length(line <- readLines(input, n=1, warn=FALSE)) > 0) { 
    # in case of empty lines 
    # more sophisticated defensive code makes sense here 
    if(nchar(line) == 0) break 

    fields <- unlist(strsplit(line, "\t")) 

    # extract 2-grams 
    d <- qgrams(tolower(fields[4]), q=2) 

    for(i in 1:ncol(d)) { 
    # language/2-gram/count 
    cat(fields[2], "\t", colnames(d)[i], "\t", d[1,i], "\n") 
    } 
} 

close(input)

# reduce.R 

input <- file("stdin", "r") 

# initialize variables that keep 
# track of the state 

is_first_line <- TRUE 

while(length(line <- readLines(input, n=1, warn=FALSE)) > 0) { 
    line <- unlist(strsplit(line, "\t")) 
    # current line belongs to previous 
    # line's key pair 
    if(!is_first_line && 
     prev_lang == line[1] && 
     prev_2gram == line[2]) { 
     sum <- sum + as.integer(line[3]) 
    } 
    # current line belongs either to a 
    # new key pair or is first line 
    else { 
    # new key pair - so output the last 
    # key pair's result 
    if(!is_first_line) { 
     # language/2-gram/count 
     cat(prev_lang,"\t",prev_2gram,"\t",sum,"\n") 
    } 
    # initialize state trackers 
    prev_lang <- line[1] 
    prev_2gram <- line[2] 
    sum <- as.integer(line[3]) 
    is_first_line <- FALSE 
    } 
} 

# the final record 
cat(prev_lang,"\t",prev_2gram, "\t", sum, "\n") 

close(input)

http://www.joyofdata.de/blog/mapreduce-r-hadoop-amazon-emr/

来源

2014-05-03 16:13:53 Raffael

如何在R中编写map reduce？

回答

相关问题