2016-05-06 38 views
1

这是一个Coursera课程,期望我们在没有任何R经验的情况下进行R编程,我真的很难理解,但没有任何线索。我甚至检查了基本的R教程,但仍然不知道。R二项式测试偏好,数据框

我们有一个CSV文件,内容:

  • 主题:30
  • 残疾:0,1
  • 偏好:轨迹球,触摸板

非残疾人,执行二项式测试以查看他们对触摸板的偏好是否与机会显着不同。到最近的万分之一(四位数),p值是多少?提示:运行一项二项式检验,比较非偏爱触摸板的残疾人总数与所有非残疾人行数的总和。有两种可能的偏好,触摸板和轨迹球,机会概率为1/2。不要纠正多重比较;考虑对数据的一个子集进行单个测试。

应该是解决办法:

  • 首先,通过绘制人的喜好没有残疾获得直觉:

    plot(df[df$Disability == "0",]$Pref) 
    
  • 其次,测试偏好触摸板与对机会轨迹球,这将是没有优先权的:

    binom.test(sum(df[df$Disability == "0",]$Pref == "touchpad"), 
          nrow(df[df$Disability == "0",]), p=1/2) 
    plot(df[df$Disability == "0",]$Pref) 
    

我明白,这应该给我们一个Disability = 0的偏好的视觉表示,但是dfs有一个错误,我不知道如何纠正它。有人可以帮忙吗?

+2

如果您提供了您正在使用的数据,那会更好,因此我们可以重现您的代码。试试'dput',或者在某处上传这个csv并发布一个链接。 –

+3

也请在问题中添加错误消息。 –

+0

感谢您的帮助!我只是想出了我需要用构建的xtab的名称替换“df”。文件:https://www.dropbox.com/s/rd796wor7by5uky/DesignExperiments_R.Rproj?dl=0 https://www.dropbox.com/s/cig2u4d5vpkjma1/deviceprefs.csv?dl = 0 – testimo

回答

0

我模拟了随机数据集与给定的特征和一切工作只是罚款:

df <- data.frame(Subject = c("Sub1", "Sub2", "Sub3", "Sub4", "Sub5", "Sub6", "Sub7", "Sub8", "Sub9", "Sub10", "Sub11", "Sub12", "Sub13", "Sub14", "Sub15", "Sub16", "Sub17", "Sub18", "Sub19", "Sub20", "Sub21", "Sub22",  "Sub23", "Sub24", "Sub25", "Sub26", "Sub27", "Sub28", "Sub29", "Sub30"), 
       Disability = c("0", "0", "1", "1", "1", "1", "0", "0", "0", "1", "1", "0", "0", "0", "0", "1", "0", "0", "1", "0", "0", "0", "0", "1", "1", "1", "0", "0", "1", "0"), 
       Pref = c("touchpad", "touchpad", "touchpad", "trackball", "trackball", "trackball", "trackball", "trackball", "trackball", "trackball", "trackball", "trackball", "touchpad", "trackball", "trackball", "touchpad", "touchpad", "trackball", "touchpad", "trackball", "touchpad", "touchpad", "trackball", "touchpad", "touchpad", "touchpad", "touchpad", "touchpad", "trackball", "trackball")) 

给定的命令的结果是以下

binom.test(sum(df[df$Disability == "0",]$Pref == "touchpad"), 
      nrow(df[df$Disability == "0",]), p=1/2) 

    Exact binomial test 

data: sum(df[df$Disability == "0", ]$Pref == "touchpad") and nrow(df[df$Disability == "0", ]) 
number of successes = 8, number of trials = 18, p-value = 0.8145 
alternative hypothesis: true probability of success is not equal to 0.5 
95 percent confidence interval: 
0.2153015 0.6924283 
sample estimates: 
probability of success 
      0.4444444 

编辑

为了将相同的测试应用于真实数据(链接到评论中给出的文件),第一步应当由命令读出存储在实际数据帧中的值来替换:

df <- read.csv("deviceprefs-1.csv") 

另外,给出的命令执行二项式检验工作得很好与真实数据组。

+0

谢谢你尝试@ vincent-guillemot我给出了答案p值= 0.8145,测试表示这是不正确的。 – testimo

+1

我想你误解了我的答案:当我说“模拟”时,这意味着我随机生成了一些数据,所以p值与您的数据不符。 –