2017-10-04 91 views
0

的样本数据如何R中立刻删除重复值的多个列的单个列

  sessionid    qf  Office 
       12    3  LON1,LON2,LON1,SEA2,SEA3,SEA3,SEA3 
       12    4  DEL2,DEL1,LON1,DEL1 
       13    5  MAn1,LON1,DEL1,LON1 

在这里,我想每一行删除重复值的列“OFFICE”。

期望输出

  sessionid    qf  Office 
       12    3  LON1,LON2,SEA2,SEA3 
       12    4  DEL2,DEL1,LON1 
       13    5  MAN1,LON1,DEL1 

回答

2

我们可以使用tidyverse。由deimiter拆分“办公室”,扩大到“长”格式,然后拿到distinct行,用“的SessionID”分组,“QF”,“办公室”的paste内容

library(tidyverse) 
separate_rows(df1, Office) %>% 
     distinct() %>% 
    group_by(sessionid, qf) %>% 
    summarise(Office = toString(Office)) 
# A tibble: 3 x 3 
# Groups: sessionid [?] 
# sessionid qf     Office 
#  <int> <int>     <chr> 
#1  12  3 LON1, LON2, SEA2, SEA3 
#2  12  4  DEL2, DEL1, LON1 
#3  13  5  MAn1, LON1, DEL1 
2

这里是一个这样做的基础R方式,它可以作为你所期望的,先拆办公室由逗号,删除重复值,然后粘贴再聚首

df$Office <- sapply(lapply(strsplit(df$Office, ","), 
          function(x) { 
          unique(x) 
          }), 
        function(x) { 
         paste(x, collapse = ",") 
        }, 
        simplify = T) 

%>%

df$Office <- df$Office %>% 
    strsplit(",") %>% 
    lapply(function(x){unique(x)}) %>% 
    sapply(function(x){paste(x,collapse = ",")},simplify = T)