2016-03-01 86 views
1

请考虑下面的数据帧传递根据数据帧的子集的功能以及数据帧列R键sapply

#build sample data.frame 
theData <- data.frame(surname = c("Smith","Parker", "Allen", "McGraw", "Parker", "Smith", "Smith"), 
        FamilySize = c(3, 2, 1, 1, 2, 3, 3)) 

首先,我需要验证的人共享同一姓氏的数量对应到他们所属的家庭的大小。例如,有3个人使用surname = "Smith",而FamilySize变量为3。如果满足这个条件,则家族的大小被附加到姓氏上(例如"3Smith")。如果不是,结果应该是"small"这个词。

为此我写了这个功能:

# function 
familyKount <- function(df, lastName, famSize){ 
    # calculate number of persons sharing same surname 
    nPersons <- dim(subset(df, surname == lastName))[1] 

    # number of persons agrees with family size 
    if(nPersons == famSize) { 
      idFam <- paste(as.character(famSize), lastName, sep="") 
    } else {    # number of persons does not agree with family size 
      idFam <- "small" 
    } 
    idFam 
} 

所以,如果我调用这个函数如下

familyKount(theData, theData$surname[1], theData$FamilySize[1]) 

我得到正确的答案:"3Smith"

但是,我想要的是将此函数应用于整个数据帧,而无需为surnameFamilySize(我不想使用for循环)指定索引。我尝试过apply系列函数的变体,但我还没有想出如何在这种情况下传递整个数据框以及它的特定列作为函数的参数。

干杯

回答

1

有很多解决方案。你可以例如使用表:

table(theData$surname) 

## Allen McGraw Parker Smith 
##  1  1  2  3 

或者与dplyr

library(dplyr) 
group_by(theData, surname) %>% 
    summarize(SizeCalculated = n() 
## Source: local data frame [4 x 2] 
## 
## surname SizeCalculated 
## (fctr)   (int) 
## 1 Allen    1 
## 2 McGraw    1 
## 3 Parker    2 
## 4 Smith    3) 

或者与aggregate()

aggregate(theData, list(theData$surname), length) 
## Group.1 surname FamilySize 
## 1 Allen  1   1 
## 2 McGraw  1   1 
## 3 Parker  2   2 
## 4 Smith  3   3 

您还可以找到一个解决方案与sapply()这可能是类似于你打算:

surnames <- unique(theData$surname) 
counts <- sapply(surnames, function(s) sum(theData$surname == s)) 
data.frame(surnames, counts) 
## surnames counts 
## 1 Smith  3 
## 2 Parker  2 
## 3 Allen  1 
## 4 McGraw  1 

这个想法是适用于姓氏。

所有这些解决方案都可以扩展为包括theDataFamilySize的检查。例如,aggregate()-溶液:

tab <- aggregate(theData, list(theData$surname), length) 
tab$size_check <- tab$surname == tab$FamilySize 
tab 
## Group.1 surname FamilySize size_check 
## 1 Allen  1   1  TRUE 
## 2 McGraw  1   1  TRUE 
## 3 Parker  2   2  TRUE 
## 4 Smith  3   3  TRUE