如何在R中测试逻辑回归模型？

我正在为Kaggle比赛开发CTR预测模型（link）。我从训练数据集的前10万线的读，然后再在这个由80/20如何在R中测试逻辑回归模型？

ad_data <- read.csv("train", header = TRUE, stringsAsFactors = FALSE, nrows = 100000) 
trainIndex <- createDataPartition(ad_data$click, p=0.8, list=FALSE, times=1) 
ad_train <- ad_data[trainIndex,] 
ad_test <- ad_data[-trainIndex,]

然后分成火车/测试集我用ad_train数据制定GLM模型

ad_glm_model <- glm(ad_train$clicks ~ ad_train$C1 + ad_train$site_category + ad_train$device_type, family = binomial(link = "logit"), data = ad_train)

但每当我尝试使用预测功能来看看它是如何以及是否在ad_test集，我得到的错误：

test_model <- predict(ad_glm_model, newdata = ad_test, type = "response") 
Warning message: 
'newdata' had 20000 rows but variables found have 80000 rows

是怎么回事？我如何在新数据上测试我的GLM模型？

编辑：它完美的作品。只需要执行此调用：

ad_glm_model <- glm(clicks ~ C1 + site_category + device_type, family = binomial(link = "logit"), data = ad_train)

来源

2017-02-20 stochasticats

不要在GLM通话使用'ad_train $'，只要使用'数据='代替 – user20650

发生这种情况是因为您在模型公式中包含每个变量的数据框的名称。相反，你的公式应该是：

glm(clicks ~ C1 + site_category + device_type, family = binomial(link = "logit"), data = ad_train)

如重复的通知中描述second link：

This is a problem of using different names between your data and your newdata and not a problem between using vectors or dataframes.

When you fit a model with the lm function and then use predict to make predictions, predict tries to find the same names on your newdata. In your first case name x conflicts with mtcars$wt and hence you get the warning.

来源

2017-02-20 23:53:48 eipi10

如何在R中测试逻辑回归模型？

回答

相关问题