2013-03-25 53 views
2

arules需要一个交易清单。列表中的每一行都将包含一组产品。不是每个交易都有相同数量的产品。这听起来像是关键,但事实并非如此。 一个例子可以发现here准备一个arules交易清单

我想是这样 aggregate(dvd , by=list("ID"), FUN=c) 失败,arguments must have same length

这是我的数据

> dvd 
    ID   Item 
1 1 Sixth Sense 
2 1   LOTR1 
3 1 Harry Potter1 
4 1 Green Mile 
5 1   LOTR2 
6 2  Gladiator 
7 2  Patriot 
8 2 Braveheart 
9 3   LOTR1 
10 3   LOTR2 
11 4  Gladiator 
12 4  Patriot 
13 4 Sixth Sense 
14 5  Gladiator 
15 5  Patriot 
16 5 Sixth Sense 
17 6  Gladiator 
18 6  Patriot 
19 6 Sixth Sense 
20 7 Harry Potter1 
21 7 Harry Potter2 
22 8  Gladiator 
23 8  Patriot 
24 9  Gladiator 
25 9  Patriot 
26 9 Sixth Sense 
27 10 Sixth Sense 
28 10   LOTR 
29 10  Galdiator 
30 10 Green Mile 

我需要一个看起来像

TR1  c("Sixth Sense","LOTR1","Harry Potter1","Green Mile","LOTR2") 
TR2  c("Gladiator","Patriot","Braveheart") 
TR3  c("LOTR1","LOTR2") 
.... 

回答

1
列表

我认为split会为你做这份工作。

DF <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 
10L, 10L, 10L, 10L), Item = c(" Sixth Sense", "   LOTR1", 
" Harry Potter1", " Green Mile", "   LOTR2", "  Gladiator", 
"  Patriot", " Braveheart", "   LOTR1", "   LOTR2", 
"  Gladiator", "  Patriot", " Sixth Sense", "  Gladiator", 
"  Patriot", " Sixth Sense", "  Gladiator", "  Patriot", 
" Sixth Sense", " Harry Potter1", " Harry Potter2", "  Gladiator", 
"  Patriot", "  Gladiator", "  Patriot", " Sixth Sense", 
" Sixth Sense", "   LOTR", "  Galdiator", " Green Mile" 
)), .Names = c("ID", "Item"), class = "data.frame", row.names = c(NA, 
-30L)) 

    DF <- read.csv(textConnection(txt), header = TRUE, stringsAsFactors = FALSE, strip.white = TRUE) 
result <- split(DF$Item, DF$ID) 
names(result) <- gsub("(.*)", "TR\\1", names(result)) 
result 
## $TR1 
## [1] "Sixth Sense" "LOTR1"   "Harry Potter1" "Green Mile" "LOTR2"   
## 
## $TR2 
## [1] "Gladiator" "Patriot" "Braveheart" 
## 
## $TR3 
## [1] "LOTR1" "LOTR2" 
## 
## $TR4 
## [1] "Gladiator" "Patriot"  "Sixth Sense" 
## 
## $TR5 
## [1] "Gladiator" "Patriot"  "Sixth Sense" 
## 
## $TR6 
## [1] "Gladiator" "Patriot"  "Sixth Sense" 
## 
## $TR7 
## [1] "Harry Potter1" "Harry Potter2" 
## 
## $TR8 
## [1] "Gladiator" "Patriot" 
## 
## $TR9 
## [1] "Gladiator" "Patriot"  "Sixth Sense" 
## 
## $TR10 
## [1] "Sixth Sense" "LOTR"  "Galdiator" "Green Mile" 
+0

哇。这很简单!谢谢。 – haki 2013-03-25 08:33:07

2

您的aggregate命令可以工作,但您没有正确指定参数。你需要这样的:with(DF, aggregate(Item, list(ID), FUN = function(x) c(as.character(x))))

或者,也可以使用公式方法aggregate

aggregate(Item ~ ID, DF, c) 
# ID             Item 
# 1 1 Sixth Sense, LOTR1, Harry Potter1, Green Mile, LOTR2 
# 2 10    Sixth Sense, LOTR, Galdiator, Green Mile 
# 3 2      Gladiator, Patriot, Braveheart 
# 4 3           LOTR1, LOTR2 
# 5 4      Gladiator, Patriot, Sixth Sense 
# 6 5      Gladiator, Patriot, Sixth Sense 
# 7 6      Gladiator, Patriot, Sixth Sense 
# 8 7       Harry Potter1, Harry Potter2 
# 9 8         Gladiator, Patriot 
# 10 9      Gladiator, Patriot, Sixth Sense 
str(.Last.value) 
# 'data.frame': 10 obs. of 2 variables: 
# $ ID : chr "1" "10" "2" "3" ... 
# $ Item:List of 10 
# ..$ 1 : chr "Sixth Sense" "LOTR1" "Harry Potter1" "Green Mile" ... 
# ..$ 6 : chr "Sixth Sense" "LOTR" "Galdiator" "Green Mile" 
# ..$ 10: chr "Gladiator" "Patriot" "Braveheart" 
# ..$ 13: chr "LOTR1" "LOTR2" 
# ..$ 15: chr "Gladiator" "Patriot" "Sixth Sense" 
# ..$ 18: chr "Gladiator" "Patriot" "Sixth Sense" 
# ..$ 21: chr "Gladiator" "Patriot" "Sixth Sense" 
# ..$ 24: chr "Harry Potter1" "Harry Potter2" 
# ..$ 26: chr "Gladiator" "Patriot" 
# ..$ 28: chr "Gladiator" "Patriot" "Sixth Sense" 

或者,可以使用 “data.table” 包:

library(data.table) 
as.data.table(DF)[, list(list(Item)), by = ID] 
#  ID            V1 
# 1: 1 Sixth Sense,LOTR1,Harry Potter1,Green Mile,LOTR2 
# 2: 2      Gladiator,Patriot,Braveheart 
# 3: 3          LOTR1,LOTR2 
# 4: 4     Gladiator,Patriot,Sixth Sense 
# 5: 5     Gladiator,Patriot,Sixth Sense 
# 6: 6     Gladiator,Patriot,Sixth Sense 
# 7: 7      Harry Potter1,Harry Potter2 
# 8: 8        Gladiator,Patriot 
# 9: 9     Gladiator,Patriot,Sixth Sense 
# 10: 10   Sixth Sense,LOTR,Galdiator,Green Mile 
2

arules' read.transactions有一个参数format那解决您的问题。这里是用法:

read.transactions(file, format = c("basket", "single"), sep = NULL, 
        cols = NULL, rm.duplicates = FALSE, encoding = "unknown") 

查看format的说法?您可以使用“篮子”或“单个”来表示输入数据的格式。您正在尝试将数据转换为“篮子”格式,但您拥有的数据类型已经是“单一” - 每行由一个带有ID的单个项目组成。只需使用read.transactions并将format设置为“单一”即可,而且您是金手镯。