转换数据帧与字tibble算

我试图执行基于http://tidytextmining.com/sentiment.html#the-sentiments-dataset情感分析。在执行情感分析之前，我需要将我的数据集转换为整洁的格式。转换数据帧与字tibble算

我的数据集的形式：

x <- c("test1" , "test2") 
y <- c("this is test text1" , "this is test text2") 
res <- data.frame("url" = x, "text" = y) 
res 
    url    text 
1 test1 this is test text1 
2 test2 this is test text2

为了转换成每行一个观察需要处理文本列，并添加包含单词和次数似乎对这个URL新列。相同的网址将出现在多行中。

这里是我的尝试：

library(tidyverse) 

x <- c("test1" , "test2") 
y <- c("this is test text1" , "this is test text2") 
res <- data.frame("url" = x, "text" = y) 
res 

res_1 <- data.frame(res$text) 
res_2 <- as_tibble(res_1) 
res_2 %>% count(res.text, sort = TRUE)

# A tibble: 2 x 2 
      res.text  n 
       <fctr> <int> 
1 this is test text1  1 
2 this is test text2  1

如何计算在res $文本数据帧的话，为了进行情感分析维持网址是什么？

更新：

x <- c("test1" , "test2") 
y <- c("this is test text1" , "this is test text2") 
res <- data.frame("url" = x, "text" = y) 
res 

res %>% 
group_by(url) %>% 
transform(text = strsplit(text, " ", fixed = TRUE)) %>% 
unnest() %>% 
count(url, text)

返回错误：

Error in strsplit(text, " ", fixed = TRUE) : non-character argument

我试图转换为tibble，因为这似乎是tidytextmining情感分析所需的格式：http://tidytextmining.com/sentiment.html#the-sentiments-dataset

来源

2017-12-02 blue-sky

为什么你需要将其转换tibble？换句话说，你的头衔似乎并不代表真正的问题。看来你只是想要一个字可以按每个网址。我认为，一个可能的tibbliverse方法可能是'水库％>％GROUP_BY（URL）％>％转化（文字= strsplit（文字 “” 固定= TRUE））％>％UNNEST（）％>％计（网址，文本）'（假设'text'是一个字符串，而不是一个因素） –

@DavidArenburg请参阅更新 –

你寻找这样的东西？当你要处理与tidytext包情感分析，则需要在每个字符字符串unnest_tokens()分隔单词。这个功能可以做的不仅仅是将文字分成单词。如果你想稍后看看这个功能。一旦你有每行一个字，你可以指望每个单词出现了多少次使用count()每个文本。然后，你想删除停用词。 tidytext软件包有数据，所以你可以调用它。最后，你需要有情绪信息。在这里，我选择了AFINN，但如果你愿意，你可以选择另一个。我希望这能帮到您。

x <- c("text1" , "text2") 
y <- c("I am very happy and feeling great." , "I am very sad and feeling low") 
res <- data.frame("url" = x, "text" = y, stringsAsFactors = F) 

# url        text 
#1 text1 I am very happy and feeling great. 
#2 text2  I am very sad and feeling low 

library(tidytext) 
library(dplyr) 

data(stop_words) 
afinn <- get_sentiments("afinn") 

unnest_tokens(res, input = text, output = word) %>% 
count(url, word) %>% 
filter(!word %in% stop_words$word) %>% 
inner_join(afinn, by = "word") 

# url word  n score 
# <chr> <chr> <int> <int> 
#1 text1 feeling  1  1 
#2 text1 happy  1  3 
#3 text2 feeling  1  1 
#4 text2  sad  1 -2

来源

2017-12-03 01:49:03 jazzurro

转换数据帧与字tibble算

回答

相关问题