SQL-选择最相似的产品

好吧，我有一个关系，它存储两个键，一个产品Id和一个属性Id。我想知道哪种产品与给定产品最相似。（属性实际上是数字，但它使例子更加混乱，使他们已被更改为字母简化视觉表现。）SQL-选择最相似的产品

Prod_att

Product | Attributes 
    1 | A  
    1 | B 
    1 | C 
    2 | A 
    2 | B 
    2 | D 
    3 | A 
    3 | E 
    4 | A

最初，这似乎相当简单，只需选择属性，一个产品已经计算出每个产品共享的属性数量。然后将结果与产品的属性数量进行比较，我可以看到两种产品的相似程度。这适用于具有相对于其比较产品的大量属性的产品，但是当产品具有非常少的属性时会出现问题。例如，产品3几乎可以与所有其他产品配合（因为A很常见）。

SELECT Product, count(Attributes) 
FROM Prod_att 
WHERE Attributes IN 
(SELECT Attributes 
FROM prod_att 
WHERE Product = 1) 
GROUP BY Product 
;

有关如何解决此问题或改进我当前查询的任何建议？
谢谢！

*编辑：产品4将返回count（）= 1的所有产品。我想展示产品3更加相似，因为它具有较少的不同属性。

来源

2013-05-08 Crp

如何定义的最小集合类似的属性？这可以通过使用'HAVING'子句来实现。 – 2013-05-08 16:53:50

http://stackoverflow.com/questions/384276/how-to-create-search-engines-like-google – 2013-05-08 16:54:12

什么[RDBMS]（http：//en.wikipedia。org/wiki/Relational_database_management_system）您正在使用？ 'RDBMS'代表*关系数据库管理系统*。 'RDBMS是SQL'的基础，并且适用于所有现代数据库系统，如MS SQL Server，IBM DB2，Oracle，MySQL等...... 您是否也可以提供您想要的结果的样本记录？ – 2013-05-08 17:06:22

试试这个

SELECT 
    a_product_id, 
    COALESCE(b_product_id, 'no_matchs_found') AS closest_product_match 
FROM (
    SELECT 
    *, 
    @row_num := IF(@prev_value=A_product_id,@row_num+1,1) AS row_num, 
    @prev_value := a_product_id 
    FROM 
    (SELECT @prev_value := 0) r 
    JOIN (
     SELECT 
     a.product_id as a_product_id, 
     b.product_id as b_product_id, 
     count(distinct b.Attributes), 
     count(distinct b2.Attributes) as total_products 
     FROM 
      products a 
      LEFT JOIN products b ON (a.Attributes = b.Attributes AND a.product_id <> b.product_id) 
      LEFT JOIN products b2 ON (b2.product_id = b.product_id) 
     /*WHERE */ 
     /* a.product_id = 3 */ 
     GROUP BY 
     a.product_id, 
     b.product_id 
     ORDER BY 
      1, 3 desc, 4 
) t 
) t2 
WHERE 
    row_num = 1

以上query得到closest matches的所有产品，您可以在最里面的查询product_id，得到的结果对于特定的product_id，我已经使用LEFT JOIN以便即使product没有匹配，它的显示

SQLFIDDLE

希望这有助于

来源

2013-05-08 18:38:25 Akash

很棒！比仅比较匹配属性复杂得多。谢谢。 – Crp 2013-05-09 05:46:04

很高兴知道它的帮助:) – Akash 2013-05-09 06:18:16

尝试"Lower bound of Wilson score confidence interval for a Bernoulli parameter"。当你有小n时，这明确地处理了统计信心的问题。它看起来像很多数学，但实际上这是关于你需要做这种事情的最低数学量。网站解释得很好。

这假定可以从正面/负面评分到匹配/不匹配属性的问题。

这里有一个正面和负面的得分和95％CL的例子：

SELECT widget_id, ((positive + 1.9208)/(positive + negative) - 
1.96 * SQRT((positive * negative)/(positive + negative) + 0.9604)/
(positive + negative))/(1 + 3.8416/(positive + negative)) 
AS ci_lower_bound FROM widgets WHERE positive + negative > 0 
ORDER BY ci_lower_bound DESC;

来源

2013-05-08 17:24:43 criticalfix

你可以写一点点看法，会给你两种产品的总共享的属性。

create view vw_shared_attributes as 
select a.product, 
     b.product 'product_match', 
     count(*) 'shared_attributes' 
from your_table a 
    inner join test b on b.attribute = a.attribute and b.product <> a.product 
group by a.product, b.product

然后使用该视图选择热门匹配。

select product, 
     (select top 1 s.product_match from vw_shared_attributes s where t.product = s.product order by s.shared_attributes desc) 
    from your_table t 
    group by product

为例见http://www.sqlfiddle.com/#!6/53039/1

来源

2013-05-08 17:35:02 Nate

SQL-选择最相似的产品

回答

相关问题