2017-08-13 77 views
1

我试图重塑使用tidyR。下面一个数据帧是数据帧:重塑重复行的列标题

data <- data.frame(class_name=c("date","date","educational","qualif","date","date",    "educational","qualif"), 
     text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY", 
      "Master of Science, Computer Science","1996","2000", 
      "MAHARASHTRA INSTITUTE OF TECHNOLOGY", 
      "Bachelor of Science, Mechanical Engineering")) 

我想数据看起来像下面的图片:

1

回答

3

这是一个使用tidyverse的想法。我们基本上每4行分组并进行传播。然而,我们需要在class_name独特率先做出的名字,即

library(tidyverse) 

data %>% 
    group_by(grp = rep(seq(n()/4), each = 4)) %>% 
    mutate(class_name = make.unique(as.character(class_name))) %>% 
    spread(class_name, text_val) %>% 
    ungroup() %>% 
    select(educational, qualif, date, date.1) 

其中给出,

# A tibble: 2 x 4 
          educational          qualif date date.1 
*        <fctr>          <fctr> <fctr> <fctr> 
1 ILLINOIS INSTITUTE OF TECHNOLOGY   Master of Science, Computer Science 2000 2003 
2 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 1996 2000 
+0

这是辉煌!答案接受。我对tidyverse很陌生,看起来很棒。感谢您提出。 – Vishnu

1

使用reshape(比索托斯的解决方案那么优雅),另一种解决方案:

data <- data.frame(class_name=c("date","date","educational","qualif","date","date",    "educational","qualif"), 
     text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY", 
      "Master of Science, Computer Science","1996","2000", 
      "MAHARASHTRA INSTITUTE OF TECHNOLOGY", 
      "Bachelor of Science, Mechanical Engineering")) 
nrec <- 4 
data$id <- rep(1:2, each=nrec) 
data$time <- rep(1:4, nrow(data)/nrec) 

df <- reshape(data, v.names="text_val", idvar="id", direction="wide")[,-1] 
names(df) <- c("id","date1","date2","educational","qualif") 
df 

# id date1 date2       educational          qualif 
# 1 1 2000 2003 ILLINOIS INSTITUTE OF TECHNOLOGY   Master of Science, Computer Science 
# 5 2 1996 2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 
+0

请注意,基本的R'reshape'函数对你的代码工作正常,所以你不需要加载任何库。 – lmo

+0

@lmo对!谢谢 ! –

+0

@MarcoSandri:感谢分享答案。 – Vishnu

0

为了完整起见,这里也是一个解决方案,使用dcast()data.table包:

library(data.table) 
setDT(data)[, rn := .I + 3L][ 
    , dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")] 
rn  date       educational          qualif 
1: 1 2000, 2003 ILLINOIS INSTITUTE OF TECHNOLOGY   Master of Science, Computer Science 
2: 2 1996, 2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 

注意toString()用作聚合功能,使得重复的日期串接在一列。这是由于OP的预期输出中的两个date列共享相同的名称,这可能表明预期的输出仅用于显示,并且不需要对date值进一步处理。


如果列顺序事宜,rn不是必需的,输出可以被美化,以更好地匹配OP的期望的结果:

lvl <- c("educational", "qualif", "date") 
setDT(data)[, rn := .I + 3L][, class_name := factor(class_name, levels = lvl)][ 
    , dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")][, rn := NULL][] 
      educational          qualif  date 
1: ILLINOIS INSTITUTE OF TECHNOLOGY   Master of Science, Computer Science 2000, 2003 
2: MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 1996, 2000 
+0

感谢发布。 – Vishnu