分析选择题基于类别

我有一个数据帧，看起来像这样分析选择题基于类别

Country <- rep(c("Austria", "Austria","Belgium", "Belgium", "Spain", "Slovenia", "France"), times=3) 
Institute <- rep(c("Inst 1","Inst 2","Inst 3","Inst 4","Inst 5","Inst 6","Inst 7"), times=3) 
Ans  <- rep(c(1,2,3,1,NA,2,2),times=3) 
Category.1 <- rep(c("Cat 1", "Cat 2", "Cat 2", "Cat 2","Cat 2", "Cat 1", "Cat 1"),times=3) 
Category.2 <- rep(c("P", "L", "M", "P", "P", "L", "M"),times=3) 
qs <- c(rep("Q1.a-Some Text", times=7),rep("Q1.b-Some Text", times=7), rep("Q1.c-Some Text", times=7))  
df <- data.frame(Country=Country,Institute=Institute, Category.1=Category.1, Category.2=Category.2, qs=qs, Ans=Ans) 
df<-df %>% spread(qs,Ans) 
head(df) 

Country Institute Category.1 Category.2 Q1.a-Some Text Q1.b-Some Text Q1.c-Some Text 
1 Austria Inst 1  Cat 1   P    1    1    1 
2 Austria Inst 2  Cat 2   L    2    2    2 
3 Belgium Inst 3  Cat 2   M    3    3    3 
4 Belgium Inst 4  Cat 2   P    1    1    1 
5 France Inst 7  Cat 1   M    2    2    2 
6 Slovenia Inst 6  Cat 1   L    2    2    2

数据帧的简短说明选择题答案：有一些问题，比如说Q1，和这个问题有是多个“子问题”，比如a，b，c，对于这些“子问题/选项”中的每一个被调查者都被要求用一些比例来回答，在本例中为1到3.我的范围是计算每个回答的每个子问题的相对频率。所以，我使用这个功能：

multichoice<-function(data, question.prefix){ 
    index<-grep(question.prefix, names(data)) # identifies the index for the available options in Q.12 
    cases<-length(index)    # The number of possible options/columns 

    # Identify the range of possible answers for each question 
    # Step 1. Search for the min in each col and across each col choose the min 
    # step 2. Search for the max in each col and across each col choose the max 

    mn<-min(data[,index[1:cases]], na.rm=T) 
    mx<-max(data[,index[1:cases]], na.rm=T) 
    d = colSums(data[, index] != 0, na.rm = TRUE) # The number of elements across column vector, that are different from zero. 

    vec<-matrix(,nrow=length(mn:mx),ncol=cases) 

    for(j in 1:cases){ 
    for(i in mn:mx){ 
     vec[i,j]=sum(data[, index[j]] == i, na.rm = TRUE)/d[j] # This stores the relative responses for option j for the answer that is i 
    } 
    } 

    vec1<-as.data.frame(vec) 
    names(vec1)<-names(data[index]) 
    vec1<-t(vec1) 
    return(vec1) 
}

调用，函数我得到所需的数据帧。

q1 <- as.data.frame(multichoiceq4(df,"^Q1")) 
head(q1) 

        V1 V2  V3 
Q1.a-Some Text 0.3333333 0.5 0.1666667 
Q1.b-Some Text 0.3333333 0.5 0.1666667 
Q1.c-Some Text 0.3333333 0.5 0.1666667

这表明，对于选择 “是”，与会的33％，与1回答，50％，2等...

我的问题

我想计算的相同但有条件的类别。所以，我想看看相对频率是如何基于category1, category2。有人可以告诉我如何做到这一点？

来源

2016-05-26 msh855

我认为你可以使你的代码更加灵活保持你的数据在长格式（即不做df<-df %>% spread(qs,Ans)），并使用dplyr，如：

这部分基本上抄录您multichoice功能的功能：

df %>% 
    group_by(qs,Ans) %>% 
    summarize(total=n()) %>% 
    filter(!is.na(Ans)) %>% 
    mutate(frac=total/sum(total)) %>% 
    dcast(qs~Ans,value.var='frac') 
#    qs   1 2   3 
# 1 Q1.a-Some Text 0.3333333 0.5 0.1666667 
# 2 Q1.b-Some Text 0.3333333 0.5 0.1666667 
# 3 Q1.c-Some Text 0.3333333 0.5 0.1666667

而这一个举例说明如何修改它以考虑类别。

df %>% 
    group_by(qs,Category.1,Ans) %>% 
    summarize(total=n()) %>% 
    filter(!is.na(Ans)) %>% 
    mutate(frac=total/sum(total)) %>% 
    dcast(qs~Ans+Category.1,value.var='frac') 
#    qs 1_Cat 1 1_Cat 2 2_Cat 1 2_Cat 2 3_Cat 2 
# 1 Q1.a-Some Text 0.3333333 0.3333333 0.6666667 0.3333333 0.3333333 
# 2 Q1.b-Some Text 0.3333333 0.3333333 0.6666667 0.3333333 0.3333333 
# 3 Q1.c-Some Text 0.3333333 0.3333333 0.6666667 0.3333333 0.3333333

来源

2016-05-26 17:27:29

这真的很酷，但我对如何阅读第二张表格感到困惑。例如，在第一个例子中，显然行数必须加起来达到100％，而解释是立即的 - 对于Q1.a来说，0.33％的回答是1,50％和2etc。然而，关于第二张桌子，我有点失落。 – msh855

在标题中，下划线之前的值显示问题编号，下一个值显示类别。如果您展示了预期产出的示例，我可以帮助您。 –

谢谢，基本上，我希望我的表能够以某种方式查找，即行数达到100％或列 - 取决于更多方便，如果需要根据不同的类别拆分表格，那么它也很好。例如，在第二个表格中，如果将第一列的答复与第三列的答案相加，就会发生这种情况，因为这显示了此类别如何对选项a，b和c作出回应，并且它们的总和确实合计为100％。 – msh855

分析选择题基于类别

回答

相关问题