集群计算教程 - 问题与传播

-1

虽然下面这个非常有趣的教程（https://rpubs.com/hrbrmstr/customer-segmentation-r），我遇到了一个我真的不明白的错误。集群计算教程 - 问题与传播

以下是导致消息'错误：值列'n'在输入中不存在的代码片段。在Rstudio 1.0.136中：

library(readxl) 
library(dplyr) 
library(tidyr) 
library(viridis) 
library(ggplot2) 
library(ggfortify) 

url <- "http://blog.yhathq.com/static/misc/data/WineKMC.xlsx" 
fil <- basename(url) 
if (!file.exists(fil)) download.file(url, fil) 

offers <- read_excel(fil, sheet = 1) 
colnames(offers) <- c("offer_id", "campaign", "varietal", "min_qty", "discount", "origin", "past_peak") 
head(offers, 12) 

transactions <- read_excel(fil, sheet = 2) 
colnames(transactions) <- c("customer_name", "offer_id") 
transactions$n <- 1 
head(transactions) 

left_join(offers, transactions, by="offer_id") %>% 
    count(customer_name, offer_id, wt=n) %>% 
    spread(offer_id, n) %>% 
    mutate_each(funs(ifelse(is.na(.), 0, .))) -> dat

最后一行是创建问题的行。

有人会知道为什么吗？

来源

2017-04-17 Romain

一般来说，你应该在这里发布一个可重复的例子，而不是使用一个在几年内易于破解的链接。一些指导：http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250另外，当然，你应该确定你使用的是哪种工具本身。 “传播”是不是在R的东西。 – Frank

当然，我的不好，我修改了一个可重复的例子 – Romain

原来的帖子好的谢谢。如果它需要一些数据的博客，它仍然不是长期可重复的。另外，如果你需要加载所有这些软件包，那大概只有dplyr，这并不是很简单。理想是[mcve]。无论如何，您可以通过查看'count'步骤是否生成一个名为'n'的列来开始调试。 – Frank

请看看手册页的?dplyr::count：

Note

The column name in the returned data is usually n, even if you have supplied a weight.

If the data already already has a column named n, the output column will be called nn. If the table already has columns called n and nn then the column returned will be nnn, and so on.

在这种情况下，原始数据已经有一个叫做n列，因此count后的新列将被称为nn。因此，您必须将spread(offer_id, n) %>%更改为spread(offer_id, nn) %>%。该教程可能会在此更改之前编写。

来源

2017-04-18 01:20:54 mt1022

集群计算教程 - 问题与传播

回答

相关问题