2017-02-28 33 views
0

我想同时在两列上创建数据调节的子集。通过两列连接进行子集化

类似于此: subsetting data using multiple variables in R

例如:

说我有这个叫Gamedat数据集:

 Games People Hoursplayed 
    goldeneye Michael   5 
    goldeneye Thatcher   8 
    goldeneye Dexter   12 
    goldeneye Dexter   15 
     pacman Dexter   2 
     tetris  Clint   5 
     tetris Dexter   8 
    goldeneye Thatcher   12 
     pacman Thatcher   15 
    goldeneye  Clint   2 
     pacman Michael   5 
     pacman Michael   8 
     pacman  Clint   12 
     tetris  John   15 
     tetris  Clint   2 
ageofempires  Clint   5 
     pacman Dexter   8 
ageofempires Thatcher   12 
ageofempires  John   15 
    goldeneye Dexter   2 

说我想看看像鹊游戏。我想看看任何玩家玩过其他游戏的时间与他们玩过goldeneye的时间相同(这在我的真实数据集中更有用)。

所以我这样做:

Gameofinterest <- Gamedat[ grep("goldeneye", Gamedat[ ,1]), ]` 

那么我这样做:

subset(Gamedat, Gamedat[ ,2] %in% Gameofinterest[ ,2] & 
    Gamedat[ ,3] %in% Gameofinterest[ ,3]) 

但是这给了我:

 Games People Hoursplayed 
    goldeneye Michael   5 
    goldeneye Thatcher   8 
    goldeneye Dexter   12 
    goldeneye Dexter   15 
     pacman Dexter   2 
     tetris Clint   5 
     tetris Dexter   8 
    goldeneye Thatcher   12 
     pacman Thatcher   15 
    goldeneye Clint   2 
     pacman Michael   5 
     pacman Michael   8 
     pacman Clint   12 
     tetris Clint   2 
ageofempires Clint   5 
     pacman Dexter   8 
ageofempires Thatcher   12 
    goldeneye Dexter   2 

当我真正想要的是这样的:

  Games People Hoursplayed 
    goldeneye Michael   5 
    goldeneye Thatcher   8 
    goldeneye Dexter   12 
    goldeneye Dexter   15 
     pacman Dexter   2 
    goldeneye Thatcher   12 
    goldeneye Clint   2 
     pacman Michael   5 
     tetris Clint   2 
    ageofempires Thatcher   12 
    goldeneye Dexter   2 

总之,我要找到匹配的“人& Hoursplayed”那个例子,

,而不是“人” &“Hoursplayed” ...有意义吗?

我知道我能做到这一点:

Gamedat$PHpaste <- paste(Gamedat$People, Gamedat$Hoursplayed, sep="") 

Gamedat[Gamedat[ ,4] %in% Gameofinterest[ ,4], ] 

,并得到:

 Games People Hoursplayed PHpaste 
    goldeneye Michael   5 Michael5 
    goldeneye Thatcher   8 Thatcher8 
    goldeneye Dexter   12 Dexter12 
    goldeneye Dexter   15 Dexter15 
     pacman Dexter   2 Dexter2 
    goldeneye Thatcher   12 Thatcher12 
    goldeneye Clint   2  Clint2 
     pacman Michael   5 Michael5 
     tetris Clint   2  Clint2 
ageofempires Thatcher   12 Thatcher12 
    goldeneye Dexter   2 Dexter2 

希望的东西更优雅?

+0

是您期望的结果是否正确?德克斯特已经打了2个小时的pacman,但是打了29个小时的goldeneye ......是不是因为这29个小时中有2个是独特记录的一部分? – shayaa

+0

最后一行显示德克斯特已经玩了2个小时,所以这是一个正确的比赛。 – StatGenGeek

回答

0

我认为这可以使用dplyr来实现。首先,使用过滤器检索游戏是否是goldeneye的行。然后使用inner_join使用People和HoursPlayed加入原始数据。可选:选择所需的列并按人员排列。

library(dplyr) 
Gamedat %>% 
    filter(Games == "goldeneye") %>% 
    inner_join(Gamedat, by = c("People", "Hoursplayed")) %>% 
    select(Games = Games.y, People, Hoursplayed) %>% 
    arrange(People) 

结果:

  Games People Hoursplayed 
1  goldeneye Clint   2 
2  tetris Clint   2 
3  goldeneye Dexter   12 
4  goldeneye Dexter   15 
5  pacman Dexter   2 
6  goldeneye Dexter   2 
7  goldeneye Michael   5 
8  pacman Michael   5 
9  goldeneye Thatcher   8 
10 goldeneye Thatcher   12 
11 ageofempires Thatcher   12 
+0

美丽的谢谢你。 – StatGenGeek