2016-11-08 52 views
1

我有以下段落:打破一个段落成句子的向量中的R

嗯,嗯......这样的个人话题。难怪我是第一个撰写评论的人。只需说这些东西就是他们声称的东西,而且味道愉快。我在这个领域遇到了重大问题,现在我没有。 “Nuff说。 :-)

RSentiment封装应用calculate_total_presence_sentiment命令的目的,我想打破这一段成句子的向量如下:

[1] "Well, um...such a personal topic."          
[2] "No wonder I am the first to write a review."        
[3] "Suffice to say this stuff does just what they claim and tastes pleasant." 
[4] "And I had, well, major problems in this area and now I don't."   
[5] "'Nuff said."                
[6] ":-)" 

非常感谢您对这个帮助。

回答

1

qdap有一个非常方便的功能:

sent_detect_nlp - 检测和分割句子的终止标记边界 使用openNLP & NLP公用事业其中 openNLP包的onld版本匹配现在删除sentDetect功能。

library(qdap) 

txt <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)" 

sent_detect_nlp(txt) 
#[1] "Well, um...such a personal topic."          
#[2] "No wonder I am the first to write a review."        
#[3] "Suffice to say this stuff does just what they claim and tastes pleasant." 
#[4] "And I had, well, major problems in this area and now I don't."   
#[5] "'Nuff said."                
#[6] ":-)" 
0

肮脏的解决方案

> data <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)" 
    > ?"regular expression" 
    > strsplit(data, "(?<=[^.][.][^.])", perl=TRUE) 
    [[1]] 
    [1] "Well, um...such a personal topic. "          
    [2] "No wonder I am the first to write a review. "        
    [3] "Suffice to say this stuff does just what they claim and tastes pleasant. " 
    [4] "And I had, well, major problems in this area and now I don't. "   
    [5] "'Nuff said. "                
    [6] ":-)"                  

使用来自https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

-1

工具可以保存一个txt文件文本。确保.txt文件中的每一行都包含一个要作为矢量读取的语句。 使用基本功能readLines('filepath/filename.txt')。 生成的数据框将读取每行在原始文本文件中作为矢量。

> mylines <- readLines('text.txt') 
Warning message: 
In readLines("text.txt") : incomplete final line found on 'text.txt' 
> mylines 
[1] "Well, um...such a personal topic."          
[2] "No wonder I am the first to write a review."        
[3] "Suffice to say this stuff does just what they claim and tastes 
pleasant." 
[4] "And I had, well, major problems in this area and now I don't."   
[5] "'Nuff said'."                
[6] ":-)" 

> mylines[3] 
[1] "Suffice to say this stuff does just what they claim and tastes 
pleasant."