2012-03-26 55 views
0

我有由R提供的瑞士数据集,其具有下述形式的数据帧:组由包含在数据帧中的各个变量的四分位数

  Fertility Agriculture Examination Education Catholic Infant.Mortality 
Courtelary  80.2  17.0   15  12  9.96    22.2 
Delemont   83.1  45.1   6   9 84.84    22.2 
Franches-Mnt  92.5  39.7   5   5 93.40    20.2 
    .    .   .   .   .  .     . 
    .    .   .   .   .  .     . 
    .    .   .   .   .  .     . 

V. De Geneve  35.0   1.2   37  53 42.34    18.0 
Rive Droite  44.7  46.6   16  29 50.43    18.2 
Rive Gauche  42.8  27.7   22  29 58.33    19.3 

我想知道如果有一个简单的或简单的方法,在四组,一个用于教育变量的每个四分位数的数据进行分类,然后得到相应的Infant.Mortality每个省,这样我就可以得到这样的:

 Group1stQ   Group1stQ   Group1stQ   Group1stQ 

    <Mortality for  <Mortality for  <Mortality for  <Mortality for 
    1st province  1st province   1st province  1st province 
    on this group>  on this group>  on this group>  on this group> 

    <Mortality for  <Mortality for  <Mortality for  <Mortality for 
    2nd province  2nd province   2nd province  2nd province 
    on this group>  on this group>  on this group>  on this group> 

    <Mortality for  <Mortality for  <Mortality for  <Mortality for 
    3rd province  3rd province   3rd province  3rd province 
    on this group>  on this group>  on this group>  on this group> 
      .     .     .     . 
      .     .     .     . 
      .     .     .     . 

在此先感谢您的帮助!

+0

要clarfiy,你要为每个分位数的_average_婴儿死亡率? – MattLBeck 2012-03-26 11:30:12

+0

对不起......我将编辑问题......那不是我所需要的......即使我很困惑......我真的很抱歉 – Throoze 2012-03-26 11:32:52

+0

我假设你的意思是“Group1stQ Group2ndQ Group3rdQ Group4thQ'的列?每个位置是否有多行? – MattLBeck 2012-03-26 11:47:17

回答

4

怎么样:

> swiss$qEdu <- cut (swiss$Education, 
        breaks = quantile (swiss$Education, c (0, .25, .5, .75, 1)), 
        include.lowest = TRUE) 

> aggregate (swiss$Infant.Mortality, list (qEdu = swiss$qEdu), FUN = mean) 
    qEdu  x 
1 [1,6] 19.31429 
2 (6,8] 21.93636 
3 (8,12] 19.38182 
4 (12,53] 19.30909 

(我真的不知道你的数字是什么 - 他们不与平均值我得到一致)

(这是编辑之前... )

(第2编辑:) 后如果你想为每个省belongig到教育探的那四分之一的Infant.Mortality,使用list()作为聚合功能:

> aggregate (swiss$Infant.Mortality, list (qEdu = swiss$qEdu), FUN = list) 
    qEdu                     x 
1 [1,6] 20.2, 24.5, 18.7, 21.2, 22.4, 15.3, 21.0, 18.0, 15.1, 19.8, 18.3, 19.4, 20.2, 16.3 
2 (6,8]     20.3, 26.6, 23.6, 24.9, 21.0, 19.1, 20.0, 23.8, 22.5, 20.0, 19.5 
3 (8,12]     22.2, 22.2, 16.5, 22.7, 20.0, 18.0, 16.7, 16.3, 17.8, 20.3, 20.5 
4 (12,53]     20.6, 24.4, 20.2, 10.8, 20.9, 18.1, 18.9, 23.0, 18.0, 18.2, 19.3 

或:

> Infant.Mortality <- lapply (levels (swiss$qEdu), function (x) swiss$Infant.Mortality [swiss$qEdu == x]) 
> names (Infant.Mortality) <- levels (swiss$qEdu) 
> Infant.Mortality 
$`[1,6]` 
[1] 20.2 24.5 18.7 21.2 22.4 15.3 21.0 18.0 15.1 19.8 18.3 19.4 20.2 16.3 

$`(6,8]` 
[1] 20.3 26.6 23.6 24.9 21.0 19.1 20.0 23.8 22.5 20.0 19.5 

$`(8,12]` 
[1] 22.2 22.2 16.5 22.7 20.0 18.0 16.7 16.3 17.8 20.3 20.5 

$`(12,53]` 
[1] 20.6 24.4 20.2 10.8 20.9 18.1 18.9 23.0 18.0 18.2 19.3 
+0

请阅读编辑和其他评论...我不需要平均数,但像我在新问题中说的那样对数据进行分组......我真的很抱歉...谢谢你的帮助! – Throoze 2012-03-26 12:16:29

+0

在aggregate()方法中,'x'(第二列)是相应分位数的死亡率值列表?如果是这样,那么这正是我需要的!非常感谢你! =) – Throoze 2012-03-26 12:49:09

+0

是的。我通过给出四分位数的因子来聚合Infant.Mortality。而不是计算一些汇总值,我使用'list'函数来获取所有这些值。 – cbeleites 2012-03-26 13:09:21