平均数的R中的字符矢量字的

我试图让字的平均数在我的特征向量中的R平均数的R中的字符矢量字的

one <- c(9, 23, 43) 
two <- c("this is a new york times article.", "short article.", "he went outside to smoke a cigarette.") 

mydf <- data.frame(one, two) 
mydf 

# one         two 
# 1 9  this is a new york times article. 
# 2 23      short article. 
# 3 43 he went outside to smoke a cigarette.

我要找的，让我平均数的函数字符向量“two”的词语。

这里的输出应该是5.3333（=（7 + 2 + 7）/ 3）

来源

2014-03-12 cptn

或者gregexpr()

mean(sapply(mydf$two,function(x)length(unlist(gregexpr(" ",x)))+1)) 
[1] 5.333333

来源

2014-03-12 11:09:56 Troy

'平均（sapply（gregexpr（“”，mydf $ 2），长度+1）'是相同的概念，但更简洁一点.... – A5C1D2H2I1M1N2O1R2T1

@AnandaMahto是好点，不知道为什么我没有' t首先这样做 – Troy

我的猜测*是，如果您使用我的建议，您将获得速度提升，因为它可以减少对“gregexpr”的调用次数。我还建议实际的解决方案应该包括：（1）首先修剪任何可能存在的前后空格;（2）使搜索词类似'“\\ s +”'。 – A5C1D2H2I1M1N2O1R2T1

哈德利韦翰的stringr包可能为此提供了最简单的方法：

library(stringr) 
foo<- str_split(two, " ") # split each element of your vector by the space sign 
sapply(foo,length) # just a quick test: how many words has each element? 
sum(sapply(foo,length))/length(foo) # calculate sum and divide it by the length of your original object 
[1] 5.333333

来源

2014-03-12 10:31:27 Max

stringr方式看起来与基本方式非常相似。唯一的区别似乎是下划线。 ;） – sgibb

我敢肯定有是一些更详尽的方法可用，但您可以使用strsplit将空格中的字符串拆分为字符向量并计算其元素长度。

mean(sapply(strsplit(as.character(mydf$two), "[[:space:]]+"), length)) 
# [1] 5.3333

来源

2014-03-12 10:31:55 sgibb

下面是与qdap包的可能性：

library(qdap) 
wc(mydf$two, FALSE)/nrow(mydf) 

## [1] 5.333333

这是矫枉过正，但你也可以这样做：

word_stats(mydf$two) 

## all n.sent n.words n.char n.syl n.poly wps cps sps psps cpw spw pspw n.state proDF2 n.hapax n.dis grow.rate prop.dis 
## 1 all  3  16  68 23  3 5.333 22.667 7.667 1 4.250 1.438 .188  3  1  12  2  .750  .125

而且wps列是每句话的词数。

来源

2014-03-12 13:03:02

创建word_stats对象并将其分配给具有该类的对象后，为什么plot.word_stats（obj）不起作用？ – lawyeR

通用'plot'对象可以在课程中起作用，所以如果你已经改变了课程，或者新课程有自己的绘图方法，那么通用的'plot'将不再起作用。无论如何，'word_stats'的'plot'只是'qheat'的一个包装，所以你仍然可以使用'qheat'。 –

@lawyeR如果这没有回答这个问题，请用数据和示例打开一个新问题。 –

平均数的R中的字符矢量字的

回答

相关问题