Using `strsplit` and `grep`, first I set made an object `para` which was your paragraph.
toMatch <- c("Martin Luther", "Paul", "Melanchthon")
unlist(strsplit(para,split="\\."))[grep(paste(toMatch, collapse="|"),unlist(strsplit(para,split="\\.")))]
> unlist(strsplit(para,split="\\."))[grep(paste(toMatch, collapse="|"),unlist(strsplit(para,split="\\.")))]
[1] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin"
[2] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21"
[3] " He studied the Scripture, especially of Paul, and Evangelical doctrine"
[4] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
或少许清洁剂:
sentences<-unlist(strsplit(para,split="\\."))
sentences[grep(paste(toMatch, collapse="|"),sentences)]
如果您正在寻找每个人作为独立的回报,则句子:
toMatch <- c("Martin Luther", "Paul", "Melanchthon")
sentences<-unlist(strsplit(para,split="\\."))
foo<-function(Match){sentences[grep(Match,sentences)]}
lapply(toMatch,foo)
[[1]]
[1] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin"
[[2]]
[1] " He studied the Scripture, especially of Paul, and Evangelical doctrine"
[[3]]
[1] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21"
[2] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
编辑3:要添加每个人的名字,做一些简单的如:
foo<-function(Match){c(Match,sentences[grep(Match,sentences)])}
编辑4:
如果你想找到句子有多人/地点/事物(字),则只需添加一个参数为这两个如:
toMatch <- c("Martin Luther", "Paul", "Melanchthon","(?=.*Melanchthon)(?=.*Scripture)")
,改变perl
到TRUE
:
foo<-function(Match){c(Match,sentences[grep(Match,sentences,perl = T)])}
> lapply(toMatch,foo)
[[1]]
[1] "Martin Luther"
[2] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin"
[[2]]
[1] "Paul"
[2] " He studied the Scripture, especially of Paul, and Evangelical doctrine"
[[3]]
[1] "Melanchthon"
[2] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21"
[3] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
[[4]]
[1] "(?=.*Melanchthon)(?=.*Scripture)"
[2] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
编辑5:回答您的其他问题:
鉴于:
sentenceR<-"Opposed as a reformer at [[Tübingen]], he accepted a call to the University of [[Wittenberg]] by [[Martin Luther]], recommended by his great-uncle [[Johann Reuchlin]]"
gsub("\\[\\[|\\]\\]", "", regmatches(sentenceR, gregexpr("\\[\\[.*?\\]\\]", sentenceR))[[1]])
会给你的双括号内的话。
> gsub("\\[\\[|\\]\\]", "", regmatches(sentenceR, gregexpr("\\[\\[.*?\\]\\]", sentenceR))[[1]])
[1] "Tübingen" "Wittenberg" "Martin Luther" "Johann Reuchlin"
许多THX,但我注意到,第一和第四句,分别有两个人的名字。如果我在“toMatch”中添加诸如“Johann Eck”或“Johann Reuchlin”这样的名字并运行上面的代码,我仍然会得到四个句子输出。我的新问题是我怎样才能得到每个人的句子(重叠)? – hui
我不太明白。你是要求a)只包含所有人的名字的句子,或者b)每个单独的名字(那些有马丁路德在其中的句子,然后是所有在他们中有保罗的句子等)的单独的回报? –
@hui让我知道,如果新的代码回答你的问题 –