2017-10-13 51 views
0

我有一个数据帧df,看起来是这样的:如何单独整行到新的数据帧使用if语句

Date   Company   MarketCap 
2000-01-31 Company one  1000 
2000-02-28 Company one  2000 
2000-03-31 Company one  3000 
2000-01-31 Company two  2500 
2000-02-28 Company two  3000 
2000-03-31 Company two  3500 
2000-01-31 Company three 1500 
2000-02-28 Company three 1800 
2000-03-31 Company three 1100 

我需要一个if语句执行以下操作:

If(df$MarketCap >= median(df$MarketCap){ 
    BigCap <- df[all the rows that have a market cap >= median(df$MarketCap) 
} 

输入文字; 对于df$MarketCap的每一行,我想检查市场上限是否大于或等于df$MarketCap的中值市值。所有包含大于或等于市值上限df$MarketCap的市值的行应构成一个新的数据框BigCap

新的数据帧从而BigCap应该是这样的:

BigCap

Date   Company   MarketCap 
2000-02-28 Company one  2000 
2000-03-31 Company one  3000 
2000-01-31 Company two  2500 
2000-02-28 Company two  3000 
2000-03-31 Company two  3500 

我觉得这应该是很容易使用if语句来acheive,但我还没有任何成功到目前为止(不是通过在SO上看到类似的问题)。我很感激我能得到的所有帮助。

请注意,我的真实DF比这里提供的示例大很多,我有360个日期和2000多家公司。

回答

2

我喜欢CPAK的答案,但如果你需要单独data.frames,这个工程:

df <- data.frame(date = rep(Sys.Date() - c(60,30,0), 3), comp = rep(1:3, each = 3), 
      cap = c(1000, 2000, 3000, 2500, 3000, 3500, 1500, 1800, 1100)) 

for (i in unique(as.character(df$date))) { 
    med <- median(df$cap[df$date == i]) 
    assign(paste0("smallCap", format(as.Date(i), "%b")), 
     df[df$date == i & df$cap < med, ]) 
    assign(paste0("bigCap", format(as.Date(i), "%b")), 
     df[df$date == i & df$cap >= med, ]) 
} 

编辑:的意见,OP要求特定月份的数据帧。

对于某一年某月,说2017年10月:

# first calculate median 
med <- median(df$cap[format(df$date, "%Y-%m") == "2017-10"]) 
# subset df 
BigCapOct <- df[format(df$date, "%Y-%m") == "2017-10" & df$cap >= med, ] 

对于月的所有年月:

med <- median(df$cap[format(df$date, "%m") == "10"]) 
BigCapOct <- df[format(df$date, "%m") == "10" & df$cap >= med, ] 
+0

我会尝试你的答案,看看我可以让它工作。请注意,我现在编辑了一下我的问题,以便更容易理解(也许可以解决) –

+0

您编辑的问题现在只需要BigCap < - df [df $ MarketCap> = median(df $ MarketCap,na .rm = T),]' –

+0

谢谢,那有效:)我确定我需要一个if和/或for语句。 如果像我原来的问题那样,我只想在某些日期这样做。也就是说,我想在1月份为所有市值上限制作BigCapJan,该值大于或等于1月份的市值中值。是否有一种简单的方法将其实施到您的解决方案中? (df $ MarketCap [stri_detect_fixed(df $ Date,“2000-01”)]> = median(df $ MarketCap [stri_detect_fixed(df $ Date,“2000-01”))我已尝试使用 'BigCapJan < - df [ ],na.rm = T),]'但这似乎不起作用。 –

2

我创建了SmallCap和,它是一个data.frames列表,其中包含< median(MarketCap)>= median(MarketCap)的观察结果。列表中的每个条目都是单独的日期。

library(dplyr) 
SmallCap <- df %>% 
      group_by(Date) %>% 
      filter(MarketCap < median(MarketCap)) %>% 
      split(.$Date) 

# $`1` 
# # A tibble: 1 x 3 
# # Groups: Date [1] 
     # Date  Company MarketCap 
     # <fctr>  <fctr>  <int> 
# 1 2000-01-31 Company_one  1000 

# $`2` 
# # A tibble: 1 x 3 
# # Groups: Date [1] 
     # Date  Company MarketCap 
     # <fctr>  <fctr>  <int> 
# 1 2000-02-28 Company_three  1800 

# $`3` 
# # A tibble: 1 x 3 
# # Groups: Date [1] 
     # Date  Company MarketCap 
     # <fctr>  <fctr>  <int> 
# 1 2000-03-31 Company_three  1100 

LargeCap <- df %>% 
     group_by(Date) %>% 
      filter(MarketCap >= median(MarketCap)) %>% 
      split(.$Date) 

# $`2000-01-31` 
# # A tibble: 2 x 3 
# # Groups: Date [1] 
     # Date  Company MarketCap 
     # <fctr>  <fctr>  <int> 
# 1 2000-01-31 Company_two  2500 
# 2 2000-01-31 Company_three  1500 

# $`2000-02-28` 
# # A tibble: 2 x 3 
# # Groups: Date [1] 
     # Date  Company MarketCap 
     # <fctr>  <fctr>  <int> 
# 1 2000-02-28 Company_one  2000 
# 2 2000-02-28 Company_two  3000 

# $`2000-03-31` 
# # A tibble: 2 x 3 
# # Groups: Date [1] 
     # Date  Company MarketCap 
     # <fctr>  <fctr>  <int> 
# 1 2000-03-31 Company_one  3000 
# 2 2000-03-31 Company_two  3500