2013-03-08 46 views
0

发现频繁模式我有以下形式的数据:分组随后中的R

CIN TRN_TYP 
9079954 1 
9079954 2 
9079954 3 
9079954 4 
9079954 5 
9079954 4 
9079954 5 
9079954 6 
9079954 7 
9079954 8 
9079954 9 
9079954 9 
.   . 
.   . 
.   . 

有100种类型的CIN(9079954,12441087,15246633,...)和各TRN_TYP

首先,我想这个数据分组到篮下格式:

9079954 1, 2, 3, 4, 5, .... 
12441087 19, 14, 21, 3, 7, ... 
. 
. 
. 

,然后从arules包应用于eclat找到频繁模式。

请帮

+0

yes eclat from arules package – 2013-03-08 11:43:20

回答

2

目前尚不清楚你想有作为output.There很多选择聚集你的结​​果是什么,在基础功能,或者使用类似plyrdatatable等外部包..

这里使用by功能的选项:

by(tab,tab$CIN,FUN=function(x) unlist(x$TRN_TYP)) 
tab$CIN: 9079954 
[1] 1 2 3 4 5 4 5 6 7 8 9 
----------------------------------------- 
tab$CIN: 9079955 
[1] 11 12 13 14 15 16 17 18 19 

编辑

要申请eclat您需要先删除重复的项目。

tab <- tab[!duplicated(tab),] 
eclat(split(tab$TRN_TYP,tab$CIN)) ## here I am using @Arun solution because 
            ## it seems that it can't coerce by output 

parameter specification: 
tidLists support minlen maxlen   target ext 
    FALSE  0.1  1  10 frequent itemsets FALSE 

algorithmic control: 
sparse sort verbose 
     7 -2 TRUE 

Warning in eclat(split(tab$TRN_TYP, tab$CIN)) : 
    You chose a very low absolute support count of 0. You might run out of memory! Increase minimum support. 

eclat - find frequent item sets with the eclat algorithm 
version 2.6 (2004.08.16)   (c) 2002-2004 Christian Borgelt 
create itemset ... 
set transactions ...[18 item(s), 2 transaction(s)] done [0.00s]. 
sorting and recoding items ... [18 item(s)] done [0.00s]. 
creating bit matrix ... [18 row(s), 2 column(s)] done [0.00s]. 
writing ... [1022 set(s)] done [0.00s]. 
Creating S4 object ... done [0.00s]. 
set of 1022 itemsets