我对R仍然很陌生，觉得必须有更好的方式来完成我所做的工作。我试图比较一个过程，并确定它是否适合特定的序列....此外，后来我打算扩大这个说，如果序列A，然后“酷”，否则，如果序列B然后“有点酷”，否则，“根本不酷”。比较R中的数据

对于示例数据，让我们确定面包师是否遵循烘焙食谱的正确步骤。

merged_data <-(sampledata,proper_sequence, "sequence description") 

1. Baker Actual_Sequence_# Sequence  proper sequence 
3. John  1   Bought ingredients 1 
4. John  2   Read recipe   1 
5. Jack  1   Read recipe   1 
6. Jack  2   Bought ingredients 1 
7. Jack  3   Mixed ingredients  3 
8. Jack  4   Preheated oven  2 
9. Jane  1   Preheated oven  2 
10. Jane  2   Bought ingredients 1 
11. Jill  1   Mixed ingredients  2 



#spread the data by actual sequence and fill with proper sequence; I feel this step could be cut out, but not sure how. 

spread_data<- spread(sampledata,key = "Actual_Sequence_#",value = "proper sequence") 

1. Baker  1 2 3 4 
2. John  1 1  
3. Jack  1 1 3 2 
4. Jane  2 1  
5. Jill  2

串联并消除重复

其实我需要这段代码帮助。所需结果是两列数据帧

condensed_data<- spread_data(group_by(Baker),????) 

1. Baker Sequence concactenated 
2. John  1  
3. Jack  1,3,2 
4. Jane  2,1  
5. Jill  2

添加一个以正确的顺序评估级联实际序列的新列

evaluation <- mutate(eval_of_sequence= 
ifelse(grepl("1,2,3,4",condensed_data$`concatenated`),"following proper sequence", 
ifelse(grepl("1,2,3",condensed_data$`concatenated`),"following proper sequence", 
ifelse(grepl("1,2",condensed_data$`concatenated`),"following proper sequence", 
ifelse(grepl("1",condensed_data$`concatenated`),"following proper sequence", 
"breaking proper sequence")) 

1. Baker Sequence_concatenated evaluation 
2. John  1   following proper sequence 
3. Jack  1,3,2  breaking proper sequence 
4. Jane  2,1   breaking proper sequence 
5. Jill  2   following proper sequence

来源

2017-10-05 Lyndon L.

我不明白这些斜杠是什么。请参阅[如何创建可重现的示例]（https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example），以获取包含示例数据和期望输出的正确方法。 – MrFlick

如果您给R代码填写初始数据并只提出一个问题，那会更好。 –

对不起，我把它打出来的乳清，它不是分隔文本，所以我用斜杠...我重写了它 –

library(dplyr) 

#Create data.frame that looks roughly like yours 
merged_data <- data.frame(Baker = c("John", "John", "Jack", "Jack", "Jack", "Jack", "Jane", "Jane", "Jill"), 
       Actual_Sequence = c(1,2,1,2,3,4,1,2,1), 
       proper_sequence = c(1,1,1,1,3,2,2,1,2)) 

#Use dplyr to group by baker, concatenate their process, then evaluate 
#by comparing to the proper sequence field. If equal assume correct. 
merged_data %>% 
    group_by(Baker) %>% 
    summarise(Actual_Sequence = paste(Actual_Sequence, collapse = ","), 
      proper_sequence = paste(proper_sequence, collapse = ",")) %>% 
    mutate(evaluation = ifelse(Actual_Sequence == proper_sequence, "following proper sequence", "breaking proper sequence"))

如果我理解您的文章得当，而且我不知道我这样做，这会给你你想要的结果。你可以拨弄dplyr声明来找出它的工作原理。

来源

2017-10-05 16:26:31 Eumenedies

感谢您的！我不得不作出调整，因为我忘记了序列号99，因为有人决定退出。但是你给我的例子让我足够概括了这个过程。再次感谢。 –

比较R中的数据

串联并消除重复

添加一个以正确的顺序评估级联实际序列的新列

回答

相关问题