我刚刚在数据集中读取数据,并在删除行后使用“?”或NA的,你会调用它,它apears仍然表明,当你键入 “?”因素级别在删除元素后仍显示
水平(样本$职业)
[1] “ADM-文书” “武装 - 强制” “工艺修复”
[5] “EXEC-管理”, “ 农牧钓鱼”, “处理程序的清洁剂”, “机-OP-inspct”
[9]“ 其他服务”, “私法内部业务模型”, “教授,专业” “ 保护-SERV”
[13] “销售”, “技术支持”, “ 交通运输移动”
而且当你使用str函数。但是,当我使用nrow命令或子集(Sample,占用==“?”)时,它似乎已被删除。你有这个解释吗? 可以在http://archive.ics.uci.edu/ml/datasets/Adult 找到完整的数据集我有另一个版本,但我认为是这样的。 :)
#Uploading data set
mappesti <- paste0(file_content,"\\2. cand.merc.(mat)\\6. Data Science\\Reidar\\")
data <- read.table(paste0(mappesti,"adult.txt"),header=F,sep=",")
#Naming data set
colnames(data) <- c("age",
"workclass",
"fnlwgt",
"education",
"education.num",
"marital.status",
"occupation",
"relationship",
"race",
"sex",
"capital.gain",
"capital.loss",
"hours.per.week",
"native.country",
"class")
length(data$occupation[data$occupation==" ?"])
length(data$native.country[data$native.country==" ?"])
length(data$workclass[data$workclass==" ?"])
#Deleting rows with " ?"
Sample <- data
str(Sample)
subset(Sample, occupation==" ?")
Sample <- subset(Sample, occupation!=" ?")
Sample <- subset(Sample, native.country!=" ?")
Sample <- subset(Sample, workclass!=" ?")
subset(Sample, occupation==" ?")
nrow(Sample)
levels(Sample$occupation)
看看'droplevels'函数。 – A5C1D2H2I1M1N2O1R2T1