我有一个中央数据框架的信息(df3),我试图根据从另一个列(df2)提取的数据进行子集和列添加, ,本身来自第三个(df1)的子集。我设法通过搜索帮助和各种功能来达到目的,但我陷入了僵局。我希望你能帮忙。从R数据框中的多列提取数据,然后搜索另一个
首先,在3dfs组成如下:
#df1 - my initial search database
id <- c("id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8")
yesno <- c("Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "No")
city <- c("London", "London", "Paris", "London", "Paris", "New York", "London", "London")
df1 <- cbind(id, yesno, city)
df1 <- as.data.frame(df1)
df1
#df2 - containing the data needed to search df3, but situated across columns
id <- c("id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8")
twitter <- c("@one","", "@three", "@four", "", "", "@seven", "")
email <- c("", "", "", "add4", "add5","", "add7", "")
mail <- c("", "postcode2", "", "","","","","postcode8")
df2 <- cbind(id, twitter, email, mail)
df2 <- as.data.frame(df2)
df2
#df3 - the central df containing the data I wish to extract
comms <- c("@one", "postcode2", "@three", "@four", "add4", "add5", "six" "@seven", "add7", "postcode2")
target <- c("text1", "text2", "text3", "text4.1", "text4.2", "text5", "text6", "text7.1","text7.2", "text8")
df3 <- cbind(comms,target)
df3 <- as.data.frame(df3)
df3
DF1 DF2和之间的共同性在ID列中找到。到目前为止,我已经能够过滤df1并提取id,然后我使用它来对df2进行子集化。
df_search <- df1 %>%
filter(yesno == "Yes", city == "London")
df_search_ids <- df_search$id
df2_search <- df2 %>%
filter(id %in% df_search_ids)
df2_search
id twitter email mail
1 id1 @one
2 id2 postcode2
3 id4 @four add4
4 id7 @seven add7
我的问题是:DF2和DF3之间的公共数据通过DF2三个不同的列特(Twitter,电子邮件和邮件)传播;这些列包含空白单元格和其他无关信息(例如'我不在Twitter上');最后df2中的一些条目(如上面的id4和id7)在df3中有多个条目。
我试图达到的解决方案是,我想从df2的列twitter,电子邮件和邮件中提取所有实例,基于与从df1提取的id匹配,以便可以应用提取的信息到子集DF3,并最终导致新的DF(target_res),看起来像这样:
id_res <- c("id1", "id2", "id4", "id4", "id7", "id7")
comms_res <- c("@one", "postcode2", "@four", "add4", "@seven", "add7")
target_res <- c("text1", "text2", "text4.1", "text4.2", "text7.1", "text7.2")
result_df <- cbind(id_res, comms_res, target_res)
result_df <- as.data.frame(result_df)
result_df
id_res comms_res target_res
1 id1 @one text1
2 id2 postcode2 text2
3 id4 @four text4.1
4 id4 add4 text4.2
5 id7 @seven text7.1
6 id7 add7 text7.2
这是一个动作,我将执行次数(基于DF1的不同探索),因此,最好将复制。
我希望这是对问题的明确解释。
查找df3中的重复项如何?你的df3有两行'postcode2'。你想要两个,第一个? – aichao
感谢您的回复。我希望来自df3的所有实例能够在comms列中找到与df2中的twitter,email,mail列相匹配的内容。在comms列中有很多重复项,但目标中的实例是唯一的,所以我希望所有这些重复项都是唯一的。 –
我正在玩str_match,但似乎无法让它工作。 –