既然你共享只有一行数据,我不能确定电子邮件主题行from_Subject
的格局。如果它是一个自动发送电子邮件系统,那么电子邮件主题行from_Subject
有固定模式。我提供了3种方法从from_Subject
中提取Patient_ID
。
library(dplyr)
df1 <- data_frame(from_Email = "[email protected]",
Time_IN = "1/11/2000 12:00:00",
from_Subject = "Patient H2445JFLD presented into ER with .... symptoms")
df2 <- data_frame(Hospital_Name = "Hospital ABC",
Patient_ID = "H2445JFLD")
# Extract 2nd word from the subject line
df1 <- df1 %>% mutate(Patient_ID = stringr::word(from_Subject, 2))
# Extract the word after "Patient" from the subject line
df1 <- df1 %>% mutate(Patient_ID = str_extract(df1$from_Subject, '(?<=Patient\\s)\\w+'))
# Extract a word of length 9 that has characters A-Z and 0-9 from the subject line
df1 <- df1 %>% mutate(Patient_ID = str_extract(df1$from_Subject, '\\b[A-Z0-9]{9}\\b'))
一旦您已经提取Patient_ID
,那么它是一个简单的左加入是你需要做的。
left_join(df1, df2, on="Patient_ID")
#Joining, by = "Patient_ID"
# A tibble: 1 × 5
# from_Email Time_IN from_Subject Patient_ID Hospital_Name
# <chr> <chr> <chr> <chr> <chr>
#1 [email protected] 1/11/2000 12:00:00 Patient H2445JFLD presented into ER with .... symptoms H2445JFLD Hospital ABC
“不幸的是,我无法提供对数据的访问。”不,但是您可以提供几行**示例**数据,这些数据实际上反映了您将收到的数据类型,而实际上并不属于数据集的一部分。例如,如果数据是追踪大学生的成绩(也受法律保护),那么您可以提供描述约翰Q.纳税人和Jane Doe学业记录的记录。你也可以提供一个[mcve]来说明你已经尝试了什么,以及为什么这不起作用。 –