2016-08-03 79 views
1

我是新来的。我刚开始学习R.将行条目转换为R中的列

我有这样的疑问:

假设我有一个数据帧:

name = c("John", "John","John","John","Mark","Mark","Mark","Mark","Dave", "Dave","Dave","Dave") 
color = c("red", "blue", "green", "yellow","red", "blue", "green", "yellow","red", "blue", "green", "yellow") 
value = c(1,2,1,3,5,5,3,2,4,6,7,8) 
df = data.frame(name, color, value) 
#View(df) 
df 
#  name color value 
# 1 John red  1 
# 2 John blue  2 
# 3 John green  1 
# 4 John yellow  3 
# 5 Mark red  5 
# 6 Mark blue  5 
# 7 Mark green  3 
# 8 Mark yellow  2 
# 9 Dave red  4 
# 10 Dave blue  6 
# 11 Dave green  7 
# 12 Dave yellow  8 

,我希望它看起来像这样:

# names red blue green yellow 
#1 John 1 2  1  3 
#2 Mark 5 5  3  2 
#3 Dave 4 6  7  8 

那是,第一列(名称)中的条目将变得唯一,第二列(颜色)中的级别将成为新列,并且这些新列中的条目将来自对应在原始数据框中的第三列(值)中的行。

我可以使用下面的做到这一点:

library(dplyr) 
    df = df %>% 
    group_by(name) %>% 
    mutate(red = ifelse(color == "red", value, 0.0), 
     blue = ifelse(color == "blue", value, 0.0), 
     green = ifelse(color == "green", value, 0.0), 
     yellow = ifelse(color == "yellow", value, 0.0)) %>% 
    group_by(name) %>% 
    summarise_each(funs(sum), red, blue, green, yellow) 
df 
    name red blue green yellow 
1 Dave  4  6  7  8 
2 John  1  2  1  3 
3 Mark  5  5  3  2 

但是,如果有很多的颜色栏的水平,这将不是很理想。我将如何继续这样做?

谢谢!

回答

3

由于OP使用dplyr家庭套餐的,一个不错的选择与tidyr

library(tidyr) 
spread(df, color, value) 
# name blue green red yellow 
#1 Dave 6  7 4  8 
#2 John 2  1 1  3 
#3 Mark 5  3 5  2 

如果我们需要使用%>%

library(dplyr) 
df %>% 
    spread(color, value) 

为了保持秩序,我们可以将'color'转换为factor类,使用levels类指定为'color'的unique值,然后执行th Ëspread

df %>% 
    mutate(color = factor(color, levels = unique(color))) %>% 
    spread(color, value) 
# name red blue green yellow 
#1 Dave 4 6  7  8 
#2 John 1 2  1  3 
#3 Mark 5 5  3  2 

或者我们可以使用data.table以更快dcast。转换为data.table并使用data.tabledcast具有优势。它比reshape2dcast快得多。

library(data.table) 
dcast(setDT(df), name~color, value.var="value") 
# name blue green red yellow 
#1: Dave 6  7 4  8 
#2: John 2  1 1  3 
#3: Mark 5  3 5  2 

注:在这两种解决方案,我们得到的列名在预期的输出,并且没有连接到它(这BTW是可以改变的任何丑陋的前缀或后缀,但它是另一行代码)


如果我们需要一个base R,一种选择是tapply

with(df, tapply(value, list(name, color), FUN = I)) 
#  blue green red yellow 
#Dave 6  7 4  8 
#John 2  1 1  3 
#Mark 5  3 5  2 
+1

这是快。谢谢! – chowching

3

所以,你要跨标签呢?

> xtabs(value~name+color, df) 
     color 
name blue green red yellow 
    Dave 6  7 4  8 
    John 2  1 1  3 
    Mark 5  3 5  2 
3

您可以使用dcastreshape2

library(reshape2) 
dcast(df, name~color) 


# name blue green red yellow 
#1 Dave 6  7 4  8 
#2 John 2  1 1  3 
#3 Mark 5  3 5  2 

要不然你可以从reshapebase R

reshape(df, idvar="name", timevar="color", direction="wide") 


# name value.red value.blue value.green value.yellow 
#1 John   1   2   1   3 
#5 Mark   5   5   3   2 
#9 Dave   4   6   7   8