2017-02-28 57 views
0

我试图在R中做一些非常简单的事情,但是我无法正确理解它。根据条件干净地生成和替换值

让我们以“钻石”数据集从ggplot 2

glimpse(diamonds) 

$ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.30, 0.23, 0.22, 0.31, 0.20, 0.32, 0.30, 0.30, 0.30, 0.30, 0.30, 0.23, 0.23, 0.31, 0.31, 0.23, ... 
$ cut  <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Very Good, Fair, Very Good, Good, Ideal, Premium, Ideal, Premium, Premium, Ideal, Good, Good, Ver... 
$ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I, E, H, J, J, G, I, J, D, F, F, F, E, E, D, F, E, H, D, I, I, J, D, D, H, F, H, H, E, H, F, G, ... 
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, SI1, SI2, SI2, I1, SI2, SI1, SI1, SI1, SI2, VS2, VS1, SI1, SI1, VVS2, VS1, VS2, VS2, VS1, VS1,... 
$ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64.0, 62.8, 60.4, 62.2, 60.2, 60.9, 62.0, 63.4, 63.8, 62.7, 63.3, 63.8, 61.0, 59.4, 58.1, 60.4, ... 
$ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58, 54, 54, 56, 59, 56, 55, 57, 62, 62, 58, 57, 57, 61, 57, 57, 57, 59, 58, 58, 59, 59, 54, 59, ... 
$ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 342, 344, 345, 345, 348, 351, 351, 351, 351, 352, 353, 353, 353, 354, 355, 357, 357, 357, 402, 4... 
$ x  <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.25, 3.93, 3.88, 4.35, 3.79, 4.38, 4.31, 4.23, 4.23, 4.21, 4.26, 3.85, 3.94, 4.39, 4.44, 3.97, ... 
$ y  <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.28, 3.90, 3.84, 4.37, 3.75, 4.42, 4.34, 4.29, 4.26, 4.27, 4.30, 3.92, 3.96, 4.43, 4.47, 4.01, ... 
$ z  <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.73, 2.46, 2.33, 2.71, 2.27, 2.68, 2.68, 2.70, 2.71, 2.66, 2.71, 2.48, 2.41, 2.62, 2.59, 2.41, ... 

说,我们要计算一个新的价格将包括“公平”的钻石有10%的折扣。我想在R中获得的将是Stata:

generate price_cut = . 
replace price_cut = price if cut != "Fair" 
replace price_cut = (0.90 * price) if cut =="Fair" 

但我无法实现它。我试图

diamonds["price_cut"] <- 0 
    diamonds[diamonds$cut == "Ideal", "price_cut"] <- diamonds$price 
    Error in `[<-.data.frame`(`*tmp*`, diamonds$cut == "Ideal", "price_cut", : 
     replacement has 53940 rows, data has 21551 

我也试过

diamonds["price_cut"] <- 0 
diamonds[diamonds$cut == "Ideal", "price_cut"] <- diamonds$price 
Error in `[<-.data.frame`(`*tmp*`, diamonds$cut == "Ideal", "price_cut", : 
    replacement has 53940 rows, data has 21551 
diamonds$price_cut[diamonds$cut !="Ideal"] <- diamonds$price * 0.9 
Warning message: 
In diamonds$price_cut[diamonds$cut != "Ideal"] <- diamonds$price : 
    number of items to replace is not a multiple of replacement length 

它有些工作在我的玩具例子,但没有与遗漏值等更复杂的数据集。

我在做什么错?

+0

你不子集划分的右侧部分:'钻石[钻石切$ ==“理想”,“ price_cut“] < - diamonds $ price [diamonds $ cut ==”Ideal“]' – Cath

+0

'require(dplyr)'; '钻石%>%mutate(cut = as.character(cut),new_price = ifelse(cut ==“Fair”,price * 0.9,price))' – count

+1

FWIW,Stata code可以从3行切换到1: '生成price_cut = cond(cut ==“Fair”,0.90 *价格,价格)' –

回答

2

从Stata的代码直接翻译会,或者更常见使用ifelse

diamonds$price_cut <- NA 
diamonds$price_cut[diamonds$cut != "Fair"] <- diamonds$price[diamonds$cut != "Fair"] 
diamonds$price_cut[diamonds$cut == "Fair"] <- (0.90 * diamonds$price[diamonds$cut == "Fair"]) 

这可以在一行中使用量化参数,像

diamonds$price_cut <- c(1, .9)[(diamonds$cut == "Fair") + 1] * diamonds$price 

来实现:

diamonds$price_cut <- ifelse(diamonds$cut == "Fair", diamonds$price, 0.9 * diamonds$price) 

边注:一个Stata的一个班轮本着同样的精神

generate price_cut = price - ((cut == "Fair") * 0.1) 

在该R

diamonds$price_cut <- diamonds$price - ((diamonds$cut == "Fair") * 0.1) 
+0

非常感谢您的帮助! –