2017-10-10 144 views
1

我有两个数据帧。第一招:使用grep标记文本并粘贴到r

keyword <- c("apple","peach","grape","berry","kiwi fruit") 
keyword <- data.frame(keyword) 

enter image description here

第二个:

sentence <- c("I like apple","I hate apple","grape is good") 
url <- c("url1","url2","url3") 
sentence <- data.frame(sentence,url) 

enter image description here

我需要的是:如果关键字包含在句子,粘贴URL到文本。如果多个句子包含关键字,请粘贴所有网址。最后的结果是这样的:

enter image description here

我试图使用代码波纹管,但预期它没有发挥出来。

keyword$Label <- character(length(keyword$keyword)) 

for (i in 1:length(keyword$keyword)) { 
keyword$Label[grep(keyword$keyword[i],sentence$sentence)] <- sentence$url 
} 
+0

您需要帮助了解如何完成这项工作? (code-wise)或者你想知道应该做什么? (在概念上) 我会建议做一个像条件加入...(概念明智) – zwep

+0

我需要代码式的解决方案。谢谢 –

回答

2

stringr + dplyr + tidyr A液:

library(stringr) 
library(dplyr) 
library(tidyr) 

sentence %>% 
    mutate(sentence = str_extract(sentence, paste0(keyword$keyword, collapse = "|"))) %>% 
    right_join(keyword, by = c("sentence" = "keyword")) %>% 
    group_by(sentence) %>% 
    mutate(URL = 1:n()) %>% 
    spread(URL, url, sep = "") %>% 
    rename(keyword = sentence) 

结果:

# A tibble: 5 x 3 
# Groups: keyword [5] 
    keyword URL1 URL2 
*  <chr> <chr> <chr> 
1  apple url1 url2 
2  berry <NA> <NA> 
3  grape url3 <NA> 
4 kiwi fruit <NA> <NA> 
5  peach <NA> <NA> 

数据:

keyword <- c("apple","peach","grape","berry","kiwi fruit") 
keyword <- data.frame(keyword, stringsAsFactors = FALSE) 
sentence <- c("I like apple","I hate apple","grape is good") 
url <- c("url1","url2","url3") 
sentence <- data.frame(sentence,url, stringsAsFactors = FALSE)