2016-07-25 61 views
1

当我使用dplyr创建一列计数时,它似乎被正确填充,直到我尝试单独使用计数列。 例子: 我创建这个数据帧:作为单独的字段或列打印时为NULL,当打印整个数据框时填充

V1 <- c("TEST", "test", "tEsT", "tesT", "TesTing", "testing","ME-TESTED", "re tested", "RE testing") 
V2 <- c("othertest", "anothertest", "testing", "123", "random stuff", "irrelevant", "tested", "re-test", "tests") 
V3 <- c("type1", "type2", "type1", "type2", "type3", "type2", "type2", "type2", "type1") 
df <- data.frame(V1, V2, V3) 

然后,我用dplyr创建计数的列:

df$counts <- df %>% group_by(V3) %>% mutate(count = n()) 

这给了预期的结果:

> df 
     V1   V2 V3 counts.V1 counts.V2 counts.V3 counts.count 
1  TEST othertest type1  TEST othertest  type1   3 
2  test anothertest type2  test anothertest  type2   5 
3  tEsT  testing type1  tEsT  testing  type1   3 
4  tesT   123 type2  tesT   123  type2   5 
5 TesTing random stuff type3 TesTing random stuff  type3   1 
6 testing irrelevant type2 testing irrelevant  type2   5 
7 ME-TESTED  tested type2 ME-TESTED  tested  type2   5 
8 re tested  re-test type2 re tested  re-test  type2   5 
9 RE testing  tests type1 RE testing  tests  type1   3 

但是,当我尝试以任何方式使用counts.count列时,结果为空:

> df$counts.count 
NULL 

由dplyr创建的其他列的结果相同。 但数据帧的其余部分似乎正常:

> df$V1 
[1] TEST  test  tEsT  tesT  TesTing testing ME-TESTED re tested RE testing 
Levels: ME-TESTED re tested RE testing test tesT tEsT TEST testing TesTing 

我完全搞不清楚为什么打印整个DF给了我不同的输出比打印的兴趣只是列。我在这里错过了什么?

+2

为什么'df $ counts <-'而不是'df <-'?您正在以这种方式在列中创建一个'data.frame'。如果你想选择这一列,你可以做'df $ count $ count',因为你需要将两个'data.frame'连续排列在一起 –

+0

我误解了语法和想法,我不得不这样做来创建一个新列。如果它在一个数据框中创建了一个数据框,那就可以解释它,但是我仍然不明白为什么当我打印df时它看起来像一个普通的列,而当我打印df $ counts.count时,它仍然是NULL。 – Thoughtcraft

+1

这就是R如何打印包含data.frame的列 –

回答

1

如果您倒带并重新创建数据框,然后不做分配,但只打印结果你看到这个画面:

df %>% group_by(V3) %>% mutate(count = n()) 

Source: local data frame [9 x 4] 
Groups: V3 [3] 

      V1   V2  V3 count 
     <fctr>  <fctr> <fctr> <int> 
1  TEST othertest type1  3 
2  test anothertest type2  5 
3  tEsT  testing type1  3 
4  tesT   123 type2  5 
5 TesTing random stuff type3  1 
6 testing irrelevant type2  5 
7 ME-TESTED  tested type2  5 
8 re tested  re-test type2  5 
9 RE testing  tests type1  3 

如果你现在做assgnment结构比较混乱,我想你可能已经获得了更多信息的错误,如果出现了V1和V2的更少的唯一值:

df$counts <- df %>% group_by(V3) %>% mutate(count = n()) 
# snipped what you already showed 
str(df) 
#----- 
'data.frame': 9 obs. of 4 variables: 
$ V1 : Factor w/ 9 levels "ME-TESTED","re tested",..: 7 4 6 5 9 8 1 2 3 
$ V2 : Factor w/ 9 levels "123","anothertest",..: 4 2 8 1 5 3 7 6 9 
$ V3 : Factor w/ 3 levels "type1","type2",..: 1 2 1 2 3 2 2 2 1 
$ counts:Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 9 obs. of 4 variables: 
    ..$ V1 : Factor w/ 9 levels "ME-TESTED","re tested",..: 7 4 6 5 9 8 1 2 3 
    ..$ V2 : Factor w/ 9 levels "123","anothertest",..: 4 2 8 1 5 3 7 6 9 
    ..$ V3 : Factor w/ 3 levels "type1","type2",..: 1 2 1 2 3 2 2 2 1 
    ..$ count: int 3 5 3 5 1 5 5 5 3 
    ..- attr(*, "vars")=List of 1 
    .. ..$ : symbol V3 
    ..- attr(*, "labels")='data.frame': 3 obs. of 1 variable: 
    .. ..$ V3: Factor w/ 3 levels "type1","type2",..: 1 2 3 
    .. ..- attr(*, "vars")=List of 1 
    .. .. ..$ : symbol V3 
    .. ..- attr(*, "drop")= logi TRUE 
    ..- attr(*, "indices")=List of 3 
    .. ..$ : int 0 2 8 
    .. ..$ : int 1 3 5 6 7 
    .. ..$ : int 4 
    ..- attr(*, "drop")= logi TRUE 
    ..- attr(*, "group_sizes")= int 3 5 1 
    ..- attr(*, "biggest_group_size")= int 5 

你所看到的格式为R如何显示嵌入在数据帧的矩阵。类table(也许是tbl?)的对象继承自matrix-类。