通过参考Collaborative filtering in MySQL?,我创建了下面的:SQL用于过滤
CREATE TABLE `ub` (
`user_id` int(11) NOT NULL,
`book_id` varchar(10) NOT NULL,
`rate` int(11) NOT NULL,
PRIMARY KEY (`user_id`,`book_id`),
UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
insert into ub values (1, 'A', '8'), (1, 'B', '7'), (1, 'C', '10');
insert into ub values (2, 'A', '8'), (2, 'B', '7'), (2, 'C', '10'), (2,'D', '8'), (2,'X', '7');
insert into ub values (3, 'X', '10'), (3, 'Y', '8'), (3, 'C', '10'), (3,'Z', '10');
insert into ub values (4, 'W', '8'), (4, 'Q', '8'), (4, 'C', '10'), (4,'Z', '8');
然后,我可以能够在下面的表格获取和了解它是如何工作的。
create temporary table ub_rank as
select similar.user_id,count(*) rank
from ub target
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id and target.rate= similar.rate
where target.user_id = 1
group by similar.user_id;
select * from ub_rank;
+---------+------+
| user_id | rank |
+---------+------+
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+---------+------+
但是,我开始在下面的代码后感到困惑。
select similar.rate, similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id
left join ub target on target.user_id = 1 and target.book_id = similar.book_id and target.Rate= similar.Rate
where target.book_id is null
group by similar.book_id
order by total_rank desc, rate desc;
+---------+------------+
| book_id | total_rank |
+---------+------------+
| X | 4 |
| D | 3 |
| Z | 2 |
| Y | 1 |
| Q | 1 |
| W | 1 |
+---------+------------+
(1, 'A', '8'), (1, 'B', '7'), (1, 'C', '10');
(2, 'A', '8'), (2, 'B', '7'), (2, 'C', '10'), (2,'D', '8'), (2,'X', '7');
我想要做的是,假设用户1和2具有相似的行为(选择A,B,与匹配的等级之前C),因此,我会建议d到用户A,因为它具有较高的速率。
似乎上面的代码不这样做?因为,排名第一的是X.我如何更改代码以达到上述目标?
或者,实际上现有的方法是否对推荐更好/更准确?
请注意,在你最后的查询中,你已经省略了'rate'列的结果,并且这些结果基本上是随机的(因为'similar .rate“未汇总,分组或功能上依赖于分组项目)。 – 2013-03-26 12:15:01
@Mark Bannister我不是很熟悉它,你会介意给我更多的提示吗? – HUNG 2013-03-26 12:20:33
我不明白你的回复 - 你了解我的评论吗? – 2013-03-26 12:22:58