R中分组变量的分类变量表

我有一个数据集，其中包含一些分类变量+“群集”变量。例如：R中分组变量的分类变量表

time <- c("Morning", "Evening" ,"Morning", "Morning", "Afternoon", "Evening", "Afternoon") 
dollar <- c("1-5", "6-10", "11-15", "1-5", "1-5", "6-10", "6-10") 
with_kids <- c("no", "yes", "yes", "no", "no", "yes", "yes") 
cluster <- c(1,1,2,3,2,2,3) 

data <- cbind(time, dollar, with_kids, cluster)

如何通过“群集”创建所有分类变量的频率表？

希望的输出是在右边（每个集群内的每个分类变量的柱％）的表。

我知道这段代码适用于一个变量。如果我有更多的分类变量，最有效的方法是什么？

table(data$time, data$cluster)

来源

2017-07-27 Ketty

'time $ cluster'或'data $ cluster'？它不是'data.frame（time，...）'而不是'cbind（time，...）'？ – AntoineBic

数据$集群。我已经做了编辑。谢谢！ – Ketty

不要使用'cbind'而是按照第一条评论中的建议使用'data.frame'。 'cbind'会创建一个字符矩阵。还包括这个例子的预期输出。 – lmo

我不能完全确定你的期望输出的，但这里有两种可能性。

表的列表：

myList <- lapply(dat[head(names(dat), -1)], table, dat$cluster) 
myList 
$time 

      1 2 3 
    Afternoon 0 1 1 
    Evening 1 1 0 
    Morning 1 1 1 

$dollar 

     1 2 3 
    1-5 1 1 1 
    11-15 0 1 0 
    6-10 1 1 1 

$with_kids 

     1 2 3 
    no 1 1 1 
    yes 1 2 1

要获得比例表的列表，你可以lapply您使用prop.table作为函数表的列表，并给它margin=2：

lapply(myList, prop.table, margin=2) 
$time 

        1   2   3 
    Afternoon 0.0000000 0.3333333 0.5000000 
    Evening 0.5000000 0.3333333 0.0000000 
    Morning 0.5000000 0.3333333 0.5000000 

$dollar 

       1   2   3 
    1-5 0.5000000 0.3333333 0.5000000 
    11-15 0.0000000 0.3333333 0.0000000 
    6-10 0.5000000 0.3333333 0.5000000 

$with_kids 

       1   2   3 
    no 0.5000000 0.3333333 0.5000000 
    yes 0.5000000 0.6666667 0.5000000

到rbind他们在一起

do.call(rbind, lapply(dat[head(names(dat), -1)], table, dat$cluster)) 
      1 2 3 
Afternoon 0 1 1 
Evening 1 1 0 
Morning 1 1 1 
1-5  1 1 1 
11-15  0 1 0 
6-10  1 1 1 
no  1 1 1 
yes  1 2 1

data

dat <- 
structure(list(time = structure(c(3L, 2L, 3L, 3L, 1L, 2L, 1L), .Label = c("Afternoon", 
"Evening", "Morning"), class = "factor"), dollar = structure(c(1L, 
3L, 2L, 1L, 1L, 3L, 3L), .Label = c("1-5", "11-15", "6-10"), class = "factor"), 
    with_kids = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("no", 
    "yes"), class = "factor"), cluster = c(1, 1, 2, 3, 2, 2, 
    3)), .Names = c("time", "dollar", "with_kids", "cluster"), row.names = c(NA, 
-7L), class = "data.frame")

来源

2017-07-27 15:41:42 lmo

lmo，还有一个问题，我们如何显示变量值左侧的变量名？因此，我想在“1-5，11-15，6-10”左边的“下午，晚上和早晨”和“美元”左边显示“时间”等。再次感谢！ – Ketty

这将取决于您想要对输出做什么。 R默认不会这样做，如果你想为报表或报表构造一个输出表，你将不得不使用像'xtable'这样的包，或者''knitr''中的'kable'函数（我不是确定它是否灵活）。如果你想要一个报表的表格，我建议你用'xtable'来玩一下，如果你找不到合适的结构，请问一个新的问题，链接到这个问题并描述你想如何使用输出。 – lmo

time <- c("Morning", "Evening" ,"Morning", "Morning", "Afternoon", "Evening", "Afternoon") 
dollar <- c("1-5", "6-10", "11-15", "1-5", "1-5", "6-10", "6-10") 
with_kids <- c("no", "yes", "yes", "no", "no", "yes", "yes") 
cluster <- c(1,1,2,3,2,2,3) 
data <- data.frame(time, dollar, with_kids, cluster)

您可以使用dplyr包并选择尽可能多的变量，只要你喜欢

library(dplyr) 
data %>% 
    group_by(interaction(time, cluster, dollar)) %>% 
    summarise(count = n()) 

# A tibble: 7 x 2 
    `interaction(time, cluster, dollar)` count 
           <fctr> <int> 
1      Morning.1.1-5  1 
2      Afternoon.2.1-5  1 
3      Morning.3.1-5  1 
4      Morning.2.11-15  1 
5      Evening.1.6-10  1 
6      Evening.2.6-10  1 
7      Afternoon.3.6-10  1

来源

2017-07-27 15:08:43 AntoineBic

R中分组变量的分类变量表

回答

相关问题