2013-03-27 105 views
1

R新手在这里,所以请原谅我的无知。我的数据是这样的:计算一列中的分类值由另一列分组

                 JOB_ROLE  EXP_IT_NETW 
1 Software engineering-related (developer, tester, project manager, architecture)  5<10 
3                  See below  None 
4                   Student   <1 
5 Software engineering-related (developer, tester, project manager, architecture)   1<5 
6                   Blogger   10+ 

我想通过柱1计算每个值的情况下,第2栏和组数,这样的结果看起来是这样的:

JOB_ROLE   None <1 1<5 5<10 10+ 
Software engineer 3  5  10  15  3 
Student    10  7  5  1  0 
... 

任何想法如何做到这一点?我的输出输出如下。提前致谢!

structure(list(JOB_ROLE = c("Software engineering-related (developer, tester, project manager, architecture)", 
"See below", "Student", "Software engineering-related (developer, tester, project manager, architecture)", 
"Blogger", "Systems Support", "Student", "IT/Network Administrator", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"Student", "Student", "Software engineering-related (developer, tester, project manager, architecture)", 
"IT hobbyist", "Student", "Software engineering-related (developer, tester, project manager, architecture)", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"IT Manager", "Software engineering-related (developer, tester, project manager, architecture)", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"IT/Network Administrator", "IT/Network Administrator", "Software engineering-related (developer, tester, project manager, architecture)", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"Student", "Software engineering-related (developer, tester, project manager, architecture)", 
"Researcher in CompSci or related field", "Researcher in CompSci or related field", 
"IT/Network Administrator", "Student", "Software engineering-related (developer, tester, project manager, architecture)", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"Education", "Software engineering-related (developer, tester, project manager, architecture)", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"IT/Network Administrator", "Software engineering-related (developer, tester, project manager, architecture)", 
"IT/Network Administrator", "Student", "IT/Network Administrator", 
"Software engineering-related (developer, tester, project manager, architecture)", 
"Student", "IT/Network Administrator", "just a layperson who has used computers for over 30 years", 
"IT/Network Administrator", "Unemployed", "Student", "IT/Network Administrator" 
), EXP_IT_NETW = c("5<10", "None", "<1", "1<5", "10+", "None", 
"1<5", "10+", "<1", "None", "1<5", "1<5", "None", "None", "10+", 
"None", "1<5", "10+", "None", "1<5", "None", "1<5", "10+", "1<5", 
"1<5", "1<5", "None", "None", "1<5", "5<10", "None", "5<10", 
"<1", "None", "1<5", "None", "1<5", "1<5", "10+", "1<5", "10+", 
"None", "1<5", "5<10", "None", "1<5", "None", "1<5", "None", 
"None", "10+")), .Names = c("JOB_ROLE", "EXP_IT_NETW"), class = "data.frame", row.names = c(1L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 16L, 17L, 18L, 
19L, 20L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 29L, 30L, 32L, 33L, 
34L, 35L, 36L, 37L, 39L, 40L, 41L, 42L, 43L, 44L, 47L, 48L, 49L, 
50L, 51L, 52L, 53L, 55L, 56L, 57L, 59L, 61L, 62L)) 

回答

5

使用table

> table(d) 
                       EXP_IT_NETW 
JOB_ROLE                   <1 1<5 10+ 5<10 None 
    Blogger                   0 0 1 0 0 
    Education                  0 0 0 0 1 
    IT hobbyist                  0 0 0 0 1 
    IT Manager                  0 1 0 0 0 
    IT/Network Administrator               0 4 5 1 0 
    just a layperson who has used computers for over 30 years      0 0 0 0 1 
    Researcher in CompSci or related field           0 1 0 0 1 
    See below                  0 0 0 0 1 
    Software engineering-related (developer, tester, project manager, architecture) 2 9 2 3 5 
    Student                   1 3 0 0 6 
    Systems Support                 0 0 0 0 1 
    Unemployed                  0 0 0 0 1 
+0

当然,答案是如此简单,它是正确的在我面前一直。谢谢。 – user2145843 2013-03-27 20:26:57

+2

@Arun,使用这个:'as.data.frame(unclass(table(d)))' – 2013-03-27 20:52:04

3

我也想用data.table但有点不同,让你期望的格式相同。

require(data.table) 
dt <- data.table(df) # here, I assume df is your data.frame 

setkey(dt, "JOB_ROLE") # setkey for fast access/grouping 

dt[, {tt <- table(factor(EXP_IT_NETW, 
       levels=factor(unique(dt$EXP_IT_NETW)))); 
     setattr(as.list(tt), 'names', names(tt)) 
     }, by = key(dt)] 

我得到这个:

#         JOB_ROLE None 10+ 1<5 5<10 <1 
# 1:     >30_years_experience 1 0 0 0 0 
# 2:        Blogger 0 1 0 0 0 
# 3:        Education 1 0 0 0 0 
# 4:        IT Manager 0 0 1 0 0 
# 5:       IT hobbyist 1 0 0 0 0 
# 6:    IT/Network Administrator 0 5 4 1 0 
# 7: Researcher in CompSci or related field 1 0 1 0 0 
# 8:        See below 1 0 0 0 0 
# 9:     Software_enginnering 5 2 9 3 2 
# 10:        Student 6 0 3 0 1 
# 11:      Systems Support 1 0 0 0 0 
# 12:        Unemployed 1 0 0 0 0 
相关问题