2013-07-15 38 views
1

我有一个包含观察得分一群人数据集,这样的:重新编码变量复制到多个新值

person_id <- c(1:50) 
person_score <- rep(1:10,5) 
people <- data.frame(person_id, person_score) 

我需要创建一组重新编码所观察到的分数值新变量。我有一组变量属于“钥匙”,用于将观察到分数为新的变量,例如:

observed <- c(1,2,3,4,5,6,7,8,9,10) 
score1 <- c(10,14,17,18,20,21,22,26,28,31) 
score2 <- c(6,9,11,14,17,18,20,24,25,26) 
score3 <- c(11,13,15,17,19,21,23,25,27,29) 
score4 <- c(43,44,45,46,47,48,49,50,51,52) 
scores <- data.frame(observed,score1,score2, score3, score4) 

...其中,所述第一值对应于观察到的得分= 1,第二个值对应观察分数= 2等等。

我需要创建一个对应score1,score2,score3四个新的变量,得分4 我能想到做手工重新编码,如下所示的,但它是非常缓慢而乏味:

people$value1[person_score == 1] <- 10 
people$value1[person_score == 2] <- 14 

...等等的score1

people$value2[person_score == 1] <- 6 
people$value2[person_score == 2] <- 9 

...等等的score2

people$value3[person_score == 1] <- 11 
people$value3[person_score == 2] <- 13 

...等等的score3

people$value4[person_score == 1] <- 43 
people$value4[person_score == 2] <- 44 

...等等的score4

回答

1

我只想用match从分数中找到正确的行data.frame ...

idx <- match(people$person_score , scores$observed) 

people_new <- cbind(people , scores[ idx , -1 ]) 

head(people_new) 
# person_id person_score score1 score2 score3 score4 
#1   1   1  10  6  11  43 
#2   2   2  14  9  13  44 
#3   3   3  17  11  15  45 
#4   4   4  18  14  17  46 
#5   5   5  20  17  19  47 
#6   6   6  21  18  21  48 
+0

这似乎工作,除了我只得到前两个新变量(score1和score2)。 – windy

+0

@windy您需要先将所有单独的分数“绑定”到“分数”data.frame(您在“匹配”中使用),例如, '分数< - cbind(观察,得分1,得分2,得分3,得分4) –

+0

太好了,这有效。非常感谢! – windy

0

您可以使用qdap package'slookup功能如下:

## person_id <- c(1:50) 
## person_score <- rep(1:10,5) 
## people <- data.frame(person_id, person_score) 
## 
## observed <- c(1,2,3,4,5,6,7,8,9,10) 
## score1 <- c(10,14,17,18,20,21,22,26,28,31) 
## score2 <- c(6,9,11,14,17,18,20,24,25,26) 
## score3 <- c(11,13,15,17,19,21,23,25,27,29) 
## score4 <- c(43,44,45,46,47,48,49,50,51,52) 
## scores <- data.frame(observed,score1,score2, score3, score4) 

library(qdap) 
people[, 3:6] <- lapply(scores[, -1], function(x) lookup(people$person_score, scores[, 1], x)) 

people 
## person_id person_score score1 score2 score3 score4 
## 1   1   1  10  6  11  43 
## 2   2   2  14  9  13  44 
## 3   3   3  17  11  15  45 
## 4   4   4  18  14  17  46 
## 5   5   5  20  17  19  47 
## 6   6   6  21  18  21  48 
## 7   7   7  22  20  23  49 
. 
. 
. 
## 50  50   10  31  26  29  52 
0

这仅仅是一个连接两个data.frames的:你可以使用merge

merge(people, scores, by.x = "person_score", by.y = "observed", all.x = TRUE) 

sqldf

library(sqldf) 
sqldf(" 
    SELECT * 
    FROM  people 
    LEFT JOIN scores 
    ON  people.person_score = scores.observed 
")