2015-10-20 70 views
0

我使用rpart包库ř插入符包(rpart包)

dt <- rpart(formula, method="class", data=full.df.allAttr.train); 

Error in model.frame.default(formula = formula, data = full.df.allAttr.train, : 
    object is not a matrix 

时得到以下错误,当我转换full.df.allAttr.trainto矩阵

dt <- rpart(formula, method="class", data= as.matrix(full.df.allAttr.train)); 

Error in model.frame.default(formula = formula, data = as.matrix(full.df.allAttr.train), : 
    'data' must be a data.frame, not a matrix or an array 

当我检查类类型它的数据帧

class(full.df.allAttr.train) 

[1] "data.frame" 

谢谢你的输入,错误,当我创建了适当的列名的公式去了其中有结果。

measurevar <- "SpeakerName" 
formula_str <- paste(measurevar, paste(rowNames, collapse=" + "), sep=" ~ ") 
formula <- as.formula(formula_str) 

它给出了不同的错误,因为我的数据帧row.names以下文字是快照

Error in model.frame.default(formula = formula, data = full.df.train, : 
    variable lengths differ (found for 'character(0)') 

enter image description here

对不起新的这个我会添加完整的源代码和数据集

library(tm) 
library(rpart) 
obamaCorpus <- Corpus(DirSource(directory = "D:/R/Chap 6/Speeches/obama" , encoding="UTF-8")) 
romneyCorpus <- Corpus(DirSource(directory = "D:/R/Chap 6/Speeches/romney" , encoding="UTF-8")) 
fullCorpus <- c(obamaCorpus,romneyCorpus)#1-22 (obama), 23-44(romney) 
fullCorpus.cleansed <- tm_map(fullCorpus, removePunctuation) 
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, stripWhitespace) 
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, tolower) 
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, removeWords, stopwords("english")) 
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, PlainTextDocument) 
#fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, stemDocument) 

full.dtm <- DocumentTermMatrix(fullCorpus.cleansed) 
full.dtm.spars <- removeSparseTerms(full.dtm , 0.6) 

full.matix <- data.matrix(full.dtm.spars) 
full.df <- as.data.frame(full.matix) 

full.df[,"SpeakerName"] <- "obama" 
full.df$SpeakerName[21:44] <- "romney" 

train.idx <- sample(nrow(full.df) , ceiling(nrow(full.df)* 0.6)) 
test.idx <- (1:nrow(full.df))[-train.idx] 
rowNames <- colnames(full.df) 

measurevar <- "SpeakerName" 
formula_str <- paste(measurevar, paste(rowNames, collapse=" + "), sep=" ~ ") 
formula <- as.formula(formula_str) 
dt <- rpart(formula, method="class", data=full.df.train); 

在最后一步失败

数据集在这里 https://drive.google.com/folderview?id=0B1SogodTE-kJSHF6aFRmQURsV0U&usp=sharing

+0

我想这是令人沮丧的。你可以创建一个可重复的例子 – rawr

+0

检查as.matrix(full.df.allAttr.train)的结果 –

+0

感谢您的输入,当我创建公式正确时,错误消失。 {measurevar < - “SpeakerName” formula_str < - paste(measurevar,paste(rowNames,collapse =“+”),sep =“〜”) formula < - as.formula(formula_str)} – user2478236

回答

0

你忘了包括full.df.train和公式不精。

这将工作:

full.df.train <- full.df[train.idx, ] 
dt <- rpart(SpeakerName ~ ., method = "class", data = full.df.train) 

与公式的问题是,您在的~双方SpeakerName。如果要使用所有变量,则使用.表达式会更加简单和紧凑。

+0

谢谢你的输入@Lluis,会试试这个。 – user2478236