2012-03-05 55 views
0

我有一个数据集,其中观察值为IDyearevent_typeevent_date。每IDyear有不平衡的观测数量。具体来说,这些是冲突年中的战斗结果。每场战斗都有一个日期和一个类型(结果)。不平衡数据集的可变创建

我想要做的是根据IDyear子集内某种类型的事件数量创建一个变量。所以:

通过ID

通过year总和event_type == x

我了解如何与一个普通的for循环做到这一点,但我知道我应该使用tapply(),因为我有不同每ID的观测数量?

回答

2

如果我理解正确的问题,然后:

aggregate(event_type ~ ID + year, subset(df,event_type=="x"), length) 
+0

我喜欢这个解。是否有类似的优雅方式来追加类型x的事件数量的总和,以obs匹配ID和年份?我只是要运行一个合并命令。 – Zach 2012-03-05 22:03:41

+0

鉴于你有不平衡的数据,合并是最简单的方法,恕我直言。 – Andrei 2012-03-06 09:33:03

2
library(plyr) 
df <-data.frame(ID=sample(11:20,25,replace=T),year=sample(1900:1905,25,replace=T),event_type=sample(c("win","lose"),25,replace=T)) 

# To see this sample data sorted by ID and year. 
arrange(df,ID,year) 
    ID year event_type 
1 11 1901  win 
2 11 1904  win 
3 11 1910  lose 
4 12 1920  lose 
5 13 1900  win 
6 13 1905  win 
7 13 1906  lose 
8 13 1912  win 
9 13 1920  lose 
10 14 1906  win 
11 14 1918  lose 
12 14 1920  win 
13 15 1909  win 
14 15 1919  win 
15 16 1916  win 
16 16 1920  lose 
17 18 1901  lose 
18 18 1910  lose 
19 18 1912  lose 
20 18 1920  win 
21 19 1916  win 
22 19 1916  win 
23 19 1917  lose 
24 20 1901  lose 
25 20 1914  lose 



    result <- ddply(df, .(ID,year,event_type),summarise, event_count=length(event_type)) 

    >result 
    ID year event_type event_count 
1 11 1903  win   1 
2 11 1905  lose   1 
3 12 1903  lose   1 
4 12 1905  win   1 
5 13 1902  win   1 
6 13 1905  lose   1 
7 14 1903  win   1 
8 15 1901  win   2 
9 15 1903  lose   1 
10 15 1905  win   1 
11 16 1904  win   1 
12 17 1904  lose   1 
13 18 1900  lose   2 
14 18 1900  win   1 
15 18 1902  lose   1 
16 18 1904  win   1 
17 18 1905  win   1 
18 19 1901  lose   1 
19 19 1902  win   1 
20 19 1903  lose   1 
21 19 1903  win   1 
22 20 1901  win   1 
23 20 1904  win   1 

比方说,你只是想吻合的胜利,而不是损失,则是这样的:

result <- ddply(subset(df,event_type=="win"), .(ID,year,event_type),summarise, event_count=length(event_type)) 

>result 
    ID year event_type event_count 
1 11 1903  win   1 
2 12 1905  win   1 
3 13 1902  win   1 
4 14 1903  win   1 
5 15 1901  win   2 
6 15 1905  win   1 
7 16 1904  win   1 
8 18 1900  win   1 
9 18 1904  win   1 
10 18 1905  win   1 
11 19 1902  win   1 
12 19 1903  win   1 
13 20 1901  win   1 
14 20 1904  win   1