我使用rpart包库ř插入符包(rpart包)
dt <- rpart(formula, method="class", data=full.df.allAttr.train);
Error in model.frame.default(formula = formula, data = full.df.allAttr.train, :
object is not a matrix
时得到以下错误,当我转换full.df.allAttr.trainto矩阵
dt <- rpart(formula, method="class", data= as.matrix(full.df.allAttr.train));
Error in model.frame.default(formula = formula, data = as.matrix(full.df.allAttr.train), :
'data' must be a data.frame, not a matrix or an array
当我检查类类型它的数据帧
class(full.df.allAttr.train)
[1] "data.frame"
谢谢你的输入,错误,当我创建了适当的列名的公式去了其中有结果。
measurevar <- "SpeakerName"
formula_str <- paste(measurevar, paste(rowNames, collapse=" + "), sep=" ~ ")
formula <- as.formula(formula_str)
它给出了不同的错误,因为我的数据帧row.names以下文字是快照
Error in model.frame.default(formula = formula, data = full.df.train, :
variable lengths differ (found for 'character(0)')
对不起新的这个我会添加完整的源代码和数据集
library(tm)
library(rpart)
obamaCorpus <- Corpus(DirSource(directory = "D:/R/Chap 6/Speeches/obama" , encoding="UTF-8"))
romneyCorpus <- Corpus(DirSource(directory = "D:/R/Chap 6/Speeches/romney" , encoding="UTF-8"))
fullCorpus <- c(obamaCorpus,romneyCorpus)#1-22 (obama), 23-44(romney)
fullCorpus.cleansed <- tm_map(fullCorpus, removePunctuation)
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, stripWhitespace)
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, tolower)
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, removeWords, stopwords("english"))
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, PlainTextDocument)
#fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, stemDocument)
full.dtm <- DocumentTermMatrix(fullCorpus.cleansed)
full.dtm.spars <- removeSparseTerms(full.dtm , 0.6)
full.matix <- data.matrix(full.dtm.spars)
full.df <- as.data.frame(full.matix)
full.df[,"SpeakerName"] <- "obama"
full.df$SpeakerName[21:44] <- "romney"
train.idx <- sample(nrow(full.df) , ceiling(nrow(full.df)* 0.6))
test.idx <- (1:nrow(full.df))[-train.idx]
rowNames <- colnames(full.df)
measurevar <- "SpeakerName"
formula_str <- paste(measurevar, paste(rowNames, collapse=" + "), sep=" ~ ")
formula <- as.formula(formula_str)
dt <- rpart(formula, method="class", data=full.df.train);
在最后一步失败
个数据集在这里 https://drive.google.com/folderview?id=0B1SogodTE-kJSHF6aFRmQURsV0U&usp=sharing
我想这是令人沮丧的。你可以创建一个可重复的例子 – rawr
检查as.matrix(full.df.allAttr.train)的结果 –
感谢您的输入,当我创建公式正确时,错误消失。 {measurevar < - “SpeakerName” formula_str < - paste(measurevar,paste(rowNames,collapse =“+”),sep =“〜”) formula < - as.formula(formula_str)} – user2478236