2013-02-12 78 views
1

我正在使用tm软件包,并且正在寻找使用R的文档的Flesch-Kincaid分数。我发现koRpus软件包有很多指标,包括阅读级别,并开始使用它。但是,返回的对象似乎是一个非常复杂的s4对象,我不明白如何解析。如何从R中的koRpus对象提取内容?

所以,我将此我的文集:

txt <- system.file("texts", "txt", package = "tm") 
(d <- Corpus(DirSource(txt, encoding = "UTF-8"), readerControl = list(language = "lat"))) 

f <- function(x) tokenize(x, format="obj", lang='en') 
g <- function(x) flesch.kincaid(x) 
x <- foreach(i=1:5) %dopar% g(f(d[[i]])) 

x是然后应用到奥维flesch.kincaid的载体。

> x[[1]] 

Flesch-Kincaid Grade Level 
    Parameters: default 
     Grade: 13.62 
     Age: 18.62 

Text language: en 

我怎样才能得到返回值等级= 13.62,年龄= 18.62?该STR(x)是如此之大,很难分析,即:

> str(x[[1]]) 
Formal class 'kRp.readability' [package "koRpus"] with 49 slots 
    [email protected] hyphen     :Formal class 'kRp.hyphen' [package "koRpus"] with 3 slots 
    .. .. [email protected] lang : chr "en" 
    .. .. [email protected] desc :List of 5 
    .. .. .. ..$ num.syll   : num 196 
    .. .. .. ..$ syll.distrib  : num [1:6, 1:4] 25 25 65 27.8 27.8 ... 
    .. .. .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" 
    .. .. .. ..$ syll.uniq.distrib: num [1:6, 1:4] 15 15 61 19.7 19.7 ... 
    .. .. .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" 
    .. .. .. ..$ avg.syll.word : num 2.18 
    .. .. .. ..$ syll.per100  : num 218 
    .. .. [email protected] hyphen:'data.frame': 90 obs. of 2 variables: 
    .. .. .. ..$ syll: num [1:90] 1 1 1 1 2 3 1 2 3 1 ... 
    .. .. .. ..$ word: chr [1:90] "Si" "quis" "in" "hoc" ... 
    [email protected] param     :List of 1 
    .. ..$ Flesch.Kincaid: Named num [1:3] 0.39 11.8 15.59 
    .. .. ..- attr(*, "names")= chr [1:3] "asl" "asw" "const" 
    [email protected] ARI      :List of 1 
    .. ..$ : logi NA 
    [email protected] ARI.NRI     :List of 1 
    .. ..$ : logi NA 
    [email protected] ARI.simple    :List of 1 
    .. ..$ : logi NA 
    [email protected] Bormuth     :List of 1 
    .. ..$ : logi NA 
    [email protected] Coleman     :List of 1 
    .. ..$ : logi NA 
    [email protected] Coleman.Liau    :List of 1 
    .. ..$ : logi NA 
    [email protected] Dale.Chall    :List of 1 
    .. ..$ : logi NA 
    [email protected] Dale.Chall.PSK   :List of 1 
    .. ..$ : logi NA 
    [email protected] Dale.Chall.old   :List of 1 
    .. ..$ : logi NA 
    [email protected] Danielson.Bryan   :List of 1 
    .. ..$ : logi NA 
    [email protected] Dickes.Steiwer   :List of 1 
    .. ..$ : logi NA 
    [email protected] DRP      :List of 1 
    .. ..$ : logi NA 
    [email protected] ELF      :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch     :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.PSK    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.de    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.es    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.fr    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.nl    :List of 1 
    .. ..$ : logi NA 
    [email protected] Flesch.Kincaid   :List of 3 
    .. ..$ flavour: chr "default" 
    .. ..$ grade : num 13.6 
    .. ..$ age : num 18.6 
    [email protected] Farr.Jenkins.Paterson :List of 1 
    .. ..$ : logi NA 
    [email protected] Farr.Jenkins.Paterson.PSK:List of 1 
    .. ..$ : logi NA 
    [email protected] FOG      :List of 1 
    .. ..$ : logi NA 
    [email protected] FOG.PSK     :List of 1 
    .. ..$ : logi NA 
    [email protected] FOG.NRI     :List of 1 
    .. ..$ : logi NA 
    [email protected] FORCAST     :List of 1 
    .. ..$ : logi NA 
    [email protected] FORCAST.RGL    :List of 1 
    .. ..$ : logi NA 
    [email protected] Fucks     :List of 1 
    .. ..$ : logi NA 
    [email protected] Harris.Jacobson   :List of 1 
    .. ..$ : logi NA 
    [email protected] Linsear.Write   :List of 1 
    .. ..$ : logi NA 
    [email protected] LIX      :List of 1 
    .. ..$ : logi NA 
    [email protected] RIX      :List of 1 
    .. ..$ : logi NA 
    [email protected] SMOG      :List of 1 
    .. ..$ : logi NA 
    [email protected] SMOG.de     :List of 1 
    .. ..$ : logi NA 
    [email protected] SMOG.C     :List of 1 
    .. ..$ : logi NA 
    [email protected] SMOG.simple    :List of 1 
    .. ..$ : logi NA 
    [email protected] Spache     :List of 1 
    .. ..$ : logi NA 
    [email protected] Spache.old    :List of 1 
    .. ..$ : logi NA 
    [email protected] Strain     :List of 1 
    .. ..$ : logi NA 
    [email protected] Traenkle.Bailer   :List of 1 
    .. ..$ : logi NA 
    [email protected] TRI      :List of 1 
    .. ..$ : logi NA 
    [email protected] Wheeler.Smith   :List of 1 
    .. ..$ : logi NA 
    [email protected] Wheeler.Smith.de   :List of 1 
    .. ..$ : logi NA 
    [email protected] Wiener.STF    :List of 1 
    .. ..$ : logi NA 
    [email protected] lang      : chr "en" 
    [email protected] desc      :List of 26 
    .. ..$ sentences   : int 10 
    .. ..$ words    : int 90 
    .. ..$ letters   : Named num [1:12] 492 0 8 9 14 18 14 9 10 6 ... 
    .. .. ..- attr(*, "names")= chr [1:12] "all" "l1" "l2" "l3" ... 
    .. ..$ all.chars   : int 692 
    .. ..$ syllables   : Named num [1:5] 196 25 32 25 8 
    .. .. ..- attr(*, "names")= chr [1:5] "all" "s1" "s2" "s3" ... 
    .. ..$ lttr.distrib  : num [1:6, 1:11] 0 0 90 0 0 ... 
    .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. ..$ : chr [1:11] "1" "2" "3" "4" ... 
    .. ..$ syll.distrib  : num [1:6, 1:4] 25 25 65 27.8 27.8 ... 
    .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" 
    .. ..$ syll.uniq.distrib : num [1:6, 1:4] 15 15 61 19.7 19.7 ... 
    .. .. ..- attr(*, "dimnames")=List of 2 
    .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... 
    .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" 
    .. ..$ punct    : int 17 
    .. ..$ conjunctions  : int 0 
    .. ..$ prepositions  : int 0 
    .. ..$ pronouns   : int 0 
    .. ..$ foreign   : int 0 
    .. ..$ TTR    : num 0.844 
    .. ..$ avg.sentc.length : num 9 
    .. ..$ avg.word.length : num 5.47 
    .. ..$ avg.syll.word  : num 2.18 
    .. ..$ sntc.per.word  : num 0.111 
    .. ..$ sntc.per100  : num 11.1 
    .. ..$ lett.per100  : num 547 
    .. ..$ syll.per100  : num 218 
    .. ..$ FOG.hard.words  : NULL 
    .. ..$ Bormuth.NOL  : NULL 
    .. ..$ Dale.Chall.NOL  : NULL 
    .. ..$ Harris.Jacobson.NOL: NULL 
    .. ..$ Spache.NOL   : NULL 
    [email protected] TT.res     :'data.frame': 107 obs. of 6 variables: 
    .. ..$ token : chr [1:107] "Si" "quis" "in" "hoc" ... 
    .. ..$ tag : chr [1:107] "word.kRp" "word.kRp" "word.kRp" "word.kRp" ... 
    .. ..$ lemma : chr [1:107] "" "" "" "" ... 
    .. ..$ lttr : num [1:107] 2 4 2 3 5 6 3 5 6 1 ... 
    .. ..$ wclass: chr [1:107] "word" "word" "word" "word" ... 
    .. ..$ desc : chr [1:107] "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" ... 

我非常喜欢的F-K分数分配给元(d)早在TM。

我很欣赏学习如何理解这个返回对象并拿出它的价值,但是,如果还有另一种更好,更快的方式来获得F-K分数,我全都是耳朵!

+0

我用foreach选择的策略似乎限制了我的错误处理能力。如果任何人有如何直接推荐这个建议,我会很感激。 – Mittenchops 2013-02-13 15:29:45

回答

3

类似@保罗的答案,但一个班轮解决方案

sapply(lapply(x,slot,'Flesch.Kincaid'),'[',c('age','grade')) 
     [,1]  [,2]  [,3]  [,4]  [,5] 
age 18.61778 17.62351 17.77699 18.29032 18.645 
grade 13.61778 12.62351 12.77699 13.29032 13.645 
+0

对于将来的人来说,只需要在tm包中使用这个更新:(我只对年龄感兴趣,而不是年级---自成绩=年龄-5岁)。我发现我不得不这样做(说它被分配给y),然后将其重新分配给元变量,即'meta(d,'f')< - unlist(y,use.names = F)' – Mittenchops 2013-02-13 14:17:23

3

只需使用:

slot(x[[1]], "Flesch.Kincaid") 

获取包含这些值对象的子集。要在x每个元素的列表得到这些,做这样的事情:

list_fk = lapply(x, slot, "Flesch.Kincaid) 

...并得到一个向量与grade

grades = sapply(list_fk, "[[", "grade")