如何对r上同一类别内的变量进行t检验？

我想对逮捕时男女之间的平均年龄进行t检验。然而，我的数据排列，像这样：如何对r上同一类别内的变量进行t检验？

Sex: Age: 
M 21 
F 31 
F 42 
M 43

有没有一种办法性类别分成两个单独的类别（男性和女性），以履行我的t检验？或者在一个类别内进行t检验？已经提出了类似的问题，但似乎没有任何数据集适用于我的数据集。感谢您提供的任何指导！

来源

2017-05-19 Eliza Paige

欢迎来到本网站！在发布问题之前，请阅读编写最小，完整可验证代码示例的指导原则（https://stackoverflow.com/help/mcve）。目前，您的问题似乎与编程相比，更多的与统计数据有关。你在用什么语言？你试过什么了？有一个统计问题的网站（https://stats.stackexchange.com/），但stackoverflow.com存在另一个目的。有关详细信息，请参阅https://stackoverflow.com/help/on-topic。 –

对不起模糊 - 我第一次使用“R”进行HS统计项目（我希望解释我的最小细节）。到目前为止，我已经尝试过：男性< - prof [其中（prof prof性别=='M'）] 女性< - prof [其中（prof性别=='F'）] t.test（男性，女性） –

我也试着在https://stackoverflow.com/questions/41442344/how-do-i-make-at-test-across-several-groups-in-one-column-in-这个链接，但不确定如何将其应用于我自己的数据。 –

t检验比较男女的年龄年龄可以做这样的：

df = data.frame(
    gender = c("M", "F", "F", "M"), 
    age = c(21, 31, 42, 43) 
) 

t.test(age ~ gender, data = df)

这是一个似乎是最相关的根据您的问题进行测试。

我不确定当你说“在一个类别中执行t检验”时，你的意思是：你可以将一组值与一些已知的参考值比如0进行比较，但我不确定什么可以告诉你（除了你的样本中的男人不是0岁）。

来源

2017-05-19 03:51:19 Marius

我认为她的意思是按照因素（分类数据）对“性别”进行排序，用t来评估每个级别的年龄。测试 – sconfluentus

首先，很好的第一个问题，很高兴看到高中的孩子学习统计编程！

第二：你自己很好地回答你的答案，这应该有助于你到达那里。

我在做一些假设：

prof是你的数据帧的名称 2，你正在寻找的性别的年龄从教授在您的t检验比较

你的逻辑正在朝着正确的方向努力。我在prof数据帧增加了一些弥补的意见，但这里是它如何工作：
# this is a comment in the code, not code, but it explains the reasoning, it always starts with hash tag

women<-prof[which(prof$Sex=="F"),] #notice the comma after parenthesis 
men<-prof[which(prof$Sex=="M"),] #notice the comma after parenthesis here too

逗号的左侧与数据==“东西”选择行。逗号的右侧告诉你哪些列，留空，告诉r包含所有列。

head(men);head(women) # shows you first 6 rows of each new frame 
# you can see below that the data is still in a data frame 

    Sex Age 
1 M 21 
4 M 43 
5 M 12 
6 M 36 
7 M 21 
10 M 23 
    Sex Age 
2 F 31 
3 F 42 
8 F 52 
9 F 21 
11 F 36

所以t检验年龄，你必须要求通过名字和年龄列的数据帧，例如：men$Age

t.test(women$Age, men$Age) #this is the test 

# results below 

Welch Two Sample t-test 

data: women$Age and men$Age 
t = 0.59863, df = 10.172, p-value = 0.5625 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 

-11.93964 20.73964 
sample estimates: 
mean of x mean of y 
    36.4  32.0

几乎总有不止一种方法中的R 。有时最初的分类比较复杂，但在数据处理上更容易。所以，如果你宁愿不从数据帧地址的年龄，你可以要求列在你的初始子集

women<-prof[which(prof$Sex=="F"),"Age"] #set women equal to just the ages where Sex is 'F' 
men<-prof[which(prof$Sex=="M"), "Age"]#set men equal to just the ages where Sex is 'M'

再次检查你的数据，这一次只是每个变量年龄的载体：

head(women); head(men) 
[1] 31 42 52 21 36 
[1] 21 43 12 36 21 23

那么你的t检验是一个简单的比较：

t.test(women,men) 
# notice same results 

    Welch Two Sample t-test 

data: women and men 
t = 0.59863, df = 10.172, p-value = 0.5625 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 
-11.93964 20.73964 
sample estimates: 
mean of x mean of y 
    36.4  32.0

看来，你的问题就出在三个点在你的代码：

使用gender=="F"当该列在您的[,]使用逗号来命名Sex:
没有指定行则未解决您的t.test列$年龄，如果它确实仍然两列

上面的代码应该可以让你知道你需要的位置。

来源

2017-05-19 04:10:15 sconfluentus

你是辉煌的！非常感谢你，这非常有帮助（现在一切都很有意义！） –

不客气！ – sconfluentus

你可以试试这个代码：

t.test(Age ~ Sex, paired = FALSE, data = datasetName)

它应该给你同样的结果，而无需创建多个子集的麻烦。

来源

2017-06-26 15:13:40

如何对r上同一类别内的变量进行t检验？

回答

相关问题