2017-06-18 84 views
1

概述:转换[R data.frame列Arules交易

我需要转换为Arules交易以下data.frame柱(T $标签):

  1. 斯卡拉
  2. ios,button,swift3,编译错误,null
  3. c#,通过引用,不安全指针
  4. 弹簧,行家,弹簧-MVC,弹簧的安全性,弹簧java的配置
  5. 机器人,机器人片段,机器人-fragmentmanager
  6. 阶,阶的集合
  7. 蟒-2.7,蟒-3。的x,matplotlib,情节

由于该数据已经在篮格式和Arules文档中下面的示例3(https://cran.r-project.org/web/packages/arules/arules.pdf,90页。)是否通过执行以下操作转换柱:

###################################################################################################### 
#Option 1 - converting data.frame as described in the documentation (page 90) 
###################################################################################################### 
## example 3: creating transactions from data.frame 
a_df <- data.frame(
    Tags = as.factor(c("scala", 
         "ios, button, swift3, compiler-errors, null", 
         "c#, pass-by-reference, unsafe-pointers", 
         "spring, maven, spring-mvc, spring-security, spring-java-config", 
         "android, android-fragments, android-fragmentmanager", 
         "scala, scala-collections", 
         "python-2.7, python-3.x, matplotlib, plot")) 
) 
## coerce 
trans3 <- as(a_df, "transactions") 
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1)) 
rules_output <- as(rules,"data.frame") 
## Result: 0 rules 
###################################################################################################### 
# Option 2 - reading from a CSV file, which contains exactly the same data 
# above without the header and the quotes 
###################################################################################################### 
file = "Test.csv" 
trans3 = read.transactions(file = file, sep = ",", format = c("basket")) 
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1)) 
rules_output <- as(rules,"data.frame") 
## Result: 198 rules 

选项1 - 结果= 规则

选择2 - 结果= 规则


问:

在我目前的任务和环境我不能负担得起保存data.frame列以形成(CSV或任何其他),然后重新阅读read.transactions(将选项1转换为选项2)。 如何将data.frame列转换为正确的格式以便正确使用apriori算法的Arules

回答

2

看看? transactions中的例子。您需要包含项目向量(项目标签)的列表,而不是data.frame

items <- strsplit(as.character(a_df$Tags), ", ") 
trans3 <- as(items, "transactions") 

rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1)) 
Apriori 

Parameter specification: 
confidence minval smax arem aval originalSupport maxtime support minlen maxlen 
     0.5 0.1 1 none FALSE   TRUE  5  0.1  1  10 
target ext 
    rules FALSE 

Algorithmic control: 
filter tree heap memopt load sort verbose 
    0.1 TRUE TRUE FALSE TRUE 2 TRUE 

Absolute minimum support count: 0 

set item appearances ...[0 item(s)] done [0.00s]. 
set transactions ...[22 item(s), 7 transaction(s)] done [0.00s]. 
sorting and recoding items ... [22 item(s)] done [0.00s]. 
creating transaction tree ... done [0.00s]. 
checking subsets of size 1 2 3 4 5 done [0.00s]. 
writing ... [198 rule(s)] done [0.00s]. 
creating S4 object ... done [0.00s]. 
+0

非常感谢迈克尔,这正是我所需要的。 – UncleDo