2017-10-05 90 views
0

我对R仍然很陌生,觉得必须有更好的方式来完成我所做的工作。我试图比较一个过程,并确定它是否适合特定的序列....此外,后来我打算扩大这个说,如果序列A,然后“酷”,否则,如果序列B然后“有点酷”,否则,“根本不酷”。比较R中的数据

对于示例数据,让我们确定面包师是否遵循烘焙食谱的正确步骤。

merged_data <-(sampledata,proper_sequence, "sequence description") 

1. Baker Actual_Sequence_# Sequence  proper sequence 
3. John  1   Bought ingredients 1 
4. John  2   Read recipe   1 
5. Jack  1   Read recipe   1 
6. Jack  2   Bought ingredients 1 
7. Jack  3   Mixed ingredients  3 
8. Jack  4   Preheated oven  2 
9. Jane  1   Preheated oven  2 
10. Jane  2   Bought ingredients 1 
11. Jill  1   Mixed ingredients  2 



#spread the data by actual sequence and fill with proper sequence; I feel this step could be cut out, but not sure how. 

spread_data<- spread(sampledata,key = "Actual_Sequence_#",value = "proper sequence") 

1. Baker  1 2 3 4 
2. John  1 1  
3. Jack  1 1 3 2 
4. Jane  2 1  
5. Jill  2 

串联并消除重复

其实我需要这段代码帮助。所需结果是两列数据帧

condensed_data<- spread_data(group_by(Baker),????) 

1. Baker Sequence concactenated 
2. John  1  
3. Jack  1,3,2 
4. Jane  2,1  
5. Jill  2 

添加一个以正确的顺序评估级联实际序列的新列

evaluation <- mutate(eval_of_sequence= 
ifelse(grepl("1,2,3,4",condensed_data$`concatenated`),"following proper sequence", 
ifelse(grepl("1,2,3",condensed_data$`concatenated`),"following proper sequence", 
ifelse(grepl("1,2",condensed_data$`concatenated`),"following proper sequence", 
ifelse(grepl("1",condensed_data$`concatenated`),"following proper sequence", 
"breaking proper sequence")) 

1. Baker Sequence_concatenated evaluation 
2. John  1   following proper sequence 
3. Jack  1,3,2  breaking proper sequence 
4. Jane  2,1   breaking proper sequence 
5. Jill  2   following proper sequence 
+0

我不明白这些斜杠是什么。请参阅[如何创建可重现的示例](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example),以获取包含示例数据和期望输出的正确方法。 – MrFlick

+2

如果您给R代码填写初始数据并只提出一个问题,那会更好。 –

+0

对不起,我把它打出来的乳清,它不是分隔文本,所以我用斜杠...我重写了它 –

回答

0
library(dplyr) 

#Create data.frame that looks roughly like yours 
merged_data <- data.frame(Baker = c("John", "John", "Jack", "Jack", "Jack", "Jack", "Jane", "Jane", "Jill"), 
       Actual_Sequence = c(1,2,1,2,3,4,1,2,1), 
       proper_sequence = c(1,1,1,1,3,2,2,1,2)) 

#Use dplyr to group by baker, concatenate their process, then evaluate 
#by comparing to the proper sequence field. If equal assume correct. 
merged_data %>% 
    group_by(Baker) %>% 
    summarise(Actual_Sequence = paste(Actual_Sequence, collapse = ","), 
      proper_sequence = paste(proper_sequence, collapse = ",")) %>% 
    mutate(evaluation = ifelse(Actual_Sequence == proper_sequence, "following proper sequence", "breaking proper sequence")) 

如果我理解您的文章得当,而且我不知道我这样做,这会给你你想要的结果。你可以拨弄dplyr声明来找出它的工作原理。

+0

感谢您的!我不得不作出调整,因为我忘记了序列号99,因为有人决定退出。但是你给我的例子让我足够概括了这个过程。再次感谢。 –