2011-06-15 54 views
16

在我的工作场所,我偶然发现了以下问题,我被要求解决。虽然不是绝对需要,但解决方案是优选的一个具有挑战性的算法问题

有一个包含故事集的数据库,每个故事都有一组与其关联的主题。主题存储在表单(storyid,topicid)的单独表格中。

问题是如何理想地选择5个主题(或者至少2个,如果5个是不可能的),使得每个主题有2个故事(或1,如果2是不可能的)在其他选择中不重复主题。该算法还必须返回与每个主题相关的恰当故事。

这实际上是一个NP完全问题,没有有效的解决方案,不仅仅是简单列举所有可能性,还是它有一个有效的解决方案?

如果它没有一个有效的解决方案,请尝试证明它(尽管不是绝对必要的)。

如果它确实有一个有效的解决方案,请善待和张贴。

非常感谢,

安东

+1

你想用什么语言来完成? – CraigW 2011-06-15 21:13:40

+3

这似乎是一个主题和故事之间的二分图上的一种匹配问题。目标是找到一个解决方案还是列出所有可能的解决方案? – hardmath 2011-06-15 21:17:53

+0

@CraigW:语言是PHP和MySQL。 – akanevsky 2011-06-15 21:18:28

回答

5

问题的更一般的版本是选择所有主题(或尽可能多的p可能的)两个故事,以便同一个故事从来没有被选择为两个不同的主题。

马克带S 故事内容S,并以T 的主题...牛逼ñ。重复每一个主题,那就是引入新的故事T“ ... T” ñ,其中T” 其S Ĵ当且仅当T 包含它。现在可以通过这种方式来解释问题:为所有主题选择不同的故事(或尽可能多)。一旦你有你的主题故事对,你可以再次加入重复的主题,每个主题将有两个故事。

发现可能的最大数量对的问题,而从未选择任何元素两次的问题称为最大二分配匹配问题。 (你可以认为它是从bipartite graph中选择最大的非连接边的数量。)有一个简单的算法,称为扩充路径,它在O((m + n)e)个步骤中解决它(e是边的数量)和一些更复杂的(例如Hopcroft–Karp algorithm),它们在O((m + n)^ 2.5)步骤左右解决它。增强路径算法包括搜索图中“未交替”路径,其中路径的第一条边未被选中,第二条路是第三条不是,依此类推,而不是反转路径上的选择。这可能可能适用于您的情况,而无需实际分割和加入主题。

这是一个有点矫枉过正,因为这将返回所有题目两个故事,而不是只有五个;如果只需要为有限数量的主题查找故事,则可以做得更好。 (除了一些边缘情况,您可以选择故事数量最多的五个主题,丢弃未包含的故事,然后运行该算法。)无论如何,它表明问题远不是NP -硬。

+0

为什么我需要复制每一个主题?你能否详细说明你的算法是如何工作的? – akanevsky 2011-06-16 01:16:43

+0

@akanevsky:我更详细地解释了它。 – Tgr 2011-06-16 06:19:14

+0

+1“无论如何,这表明问题远非NP难。” – csl 2011-06-16 06:22:44

-3

如果wona选择类似不同的故事类似5个话题,因为我明白: 这样就可以使一个连接的2个表,并在查询中使用top 5与在条件主题标题=

“你想的话题,”如果不帮助好心说清楚,我>>>

+0

我不仅需要5个主题,还需要5个主题,每个主题有2个故事,其他4个主题都不重复。 – akanevsky 2011-06-15 21:18:07

+0

你真的让我感到困惑,,你可以发布信息和例子如:story1 --topic1,标题2,topic3 ... story2 ---标题2,topic5 .... – Tamer 2011-06-15 21:34:49

+0

为什么你需要具体的例子?我有一个表格(storyid,title)和一个表格(storyid,topicid)。我想向用户展示5个主题,当他选择其中2个主题时,我不希望他在两个主题下看到相同的故事。现在更清楚了吗? – akanevsky 2011-06-15 21:42:03

0

试着将这个以满足您的需求:

SELECT topic, story 
FROM story_topic 
WHERE story IN (SELECT story FROM story_topic GROUP BY story HAVING COUNT(*) = 1); 

这里的关键是要知道什么样的故事只发生只有一个话题。您可能需要预先计算消除子选择的主题数量。

+0

可能是所有故事都出现在多个主题中。 – akanevsky 2011-06-16 01:08:51

+0

你原来的问题会如何处理?也许我只是误解了这个问题。 – jond3k 2011-06-16 08:38:52

0

这个怎么样? (如果我理解你的问题)

(我没有真正运行它 - 只是一个想法 - 所以......可能是错误,或者我可能明显错过了某些东西但是 - 目前,我疲倦的头脑认为它的工作:)

$num_topics = 5; 
$stories_per = 5; 
$stories = array(); //array to store story ids 

//select 5 topics 
$query = mysql_query("SELECT * FROM topics ORDER BY RAND() LIMIT ".$num_topics); 

//run repeat as many times as you want stories 
for($i=0; $i<$stories_per; $i++) { 

    //repeat through each selected topic 
    while($row = mysql_fetch_array($query)) { 

     $q_addon = ""; 
     foreach($stories as $value) { 
      $q_addon .= "id <> '".$value."' AND "; 
     } 

     //find a story not yet chosen for each topic 
     $q = mysql_query("SELECT storyid FROM stories_topics WHERE ".$q_addon." topicid='".$row['id']."' LIMIT 1"); 

     //add that id to your $stories array 
     $tmp_id = mysql_result($q,0,'storyid'); 
     array_push($stories, $tmp_id); 

    } 
} 
3

假设我们有涉及TopicID和StoryID, 与一对形成化合物主键列的SQL表TopicStory。

的目标是找到某种一套TopicID的组成。发布的问题最多需要五个 。深度优先搜索 这些概述如下。

随着搜索的深度限定在五,指出问题不能 比多项式复杂性更差。然而,要求为最大的话题集合,可以与像 约束(选择每个主题不相关的 任何其他选择的主题至少两层)可以发现一个普遍问题 可能是NP完全问题。

一词的使用“搜索”中提出了一种回溯算法。在 以下,我们通过嵌套循环实现了回溯,其中每个循环是 ,由其外部的循环定义和参数化。

在我们给出详细的细节之前,可能会激发描述 “蛮力”的方法,接下来可以更容易地理解更复杂的方法 。

BRUTE_FORCE: 

Generate all possible subsets of five topics. 
Test each of these sets for feasibility (each topic has 
at least two stories unrelated to any of the other topics). 

我们的深度优先搜索的草图假设议题一共有排序, 例如按照TopicID的整数值排序。这允许生成主题 而不重复(由于主题的排列)。

NESTED_LOOPS: 

(Loop_1) Select into List_1 all topics with at least two stories. 
     Iterate through List_1, choosing the first topic %T1%. 
     PASS control into Loop_2. 
     CONTINUE in Loop_1. 
     If the end of List_1 is reached, EXIT with failure. 

(Loop_2) Select into List_2 all topics > %T1% with at least two 
     stories unrelated to %T1%. 
     Iterate through List_2, choosing the second topic %T2%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2%, PASS control into Loop_3. 
     CONTINUE in Loop_2. 
     If the end of List_2 is reached, go BACK to Loop_1. 

(Loop 3) Select into List_3 all topics > %T2% with at least two 
     stories unrelated to %T1% or %T2%. 
     Iterate through List_3, choosing the third topic %T3%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2% or %T3%, 
     and topic %T2% still has at least two stories unrelated 
     to %T1% or %T3%, PASS control into Loop_4. 
     CONTINUE in Loop_3. 
     If the end of List_3 is reached, go BACK to Loop_2. 

(Loop 4) Select into List_4 all topics > %T3% with at least two 
     stories unrelated to %T1%, %T2%, or %T3%. 
     Iterate through List_4, choosing the fourth topic %T4%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2%, %T3%, or %T4%, 
     and topic %T2% still has at least two stories unrelated 
     to %T1%, %T3%, or %T4%, 
     and topic %T3% still has at least two stories unrelated 
     to %T1%, %T2%, or %T4%, PASS control into Loop_5. 
     CONTINUE in Loop_4. 
     If the end of List_4 is reached, go BACK to Loop_3. 

(Loop 5) Select into List_5 all topics > %T4% with at least two 
     stories unrelated to %T1%, %T2%, %T3%, or %T4%. 
     Iterate through List_5, choosing the fifh topic %T5%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2%, %T3%, %T4%, or %T5%, 
     and topic %T2% still has at least two stories unrelated 
     to %T1%, %T3%, %T4%, or %T5%, 
     and topic %T3% still has at least two stories unrelated 
     to %T1%, %T2%, %T4%, or %T5%, 
     and topic %T4% still has at least two stories unrelated 
     to %T1%, %T2%, %T3%, or %T5%, EXIT with success 
     returning five topics %T1%, %T2%, %T3%, %T4%, and %T5%. 
     CONTINUE in Loop_5. 
     If the end of List_5 is reached, go BACK to Loop_4. 

使用“选择”在每个嵌套循环的开口是为了唤起 SQL查询的可能性,以实现许多逻辑的。对于 例如最外层循环基本上是刚开结果集中 此查询:

SELECT TS1.TopicID, Count(*) 
From TopicStory TS1 
Group By TS1.TopicID 
Having Count(*) > 1 

内部循环的相应的列表可以类似 通过SQL查询取决于在 所选主题的参数值来构造外环。为了说明没有不必要的重复让我们跳 权内部循环,并给出List_5适当的查询:

SELECT TS5.TopicID, Count(*) 
From TopicStory TS5 
Where TS5.TopicID > %T4% 
    and NOT EXISTS (SELECT * 
         From TopicStory TSX 
         Where TSX.TopicID in (%T1%,%T2%,%T3%,%T4%) 
         and TSX.StoryID = TS5.StoryID 
        ) 
Group By TS5.TopicID 
Having Count(*) > 1 

这之后将在List_5检查%T5%产生 计数至少两个故事的左对于主题%T1%:

SELECT Count(*) 
From TopicStory TZ1 
Where TZ1.TopicID = %T1% 
    and NOT EXISTS (SELECT * 
        From TopicStory TX1 
        Where TX1.StoryID = TZ1.StoryID 
         and TX1.TopicID in (%T2%,%T3%,%T4%,TS5.TopicID) 
       ) 

并且对于其他先前话题选择进行了必要的修改。

虽然它可能会降低性能不可接受,限制相关%T5%主题额外的逻辑 (使前面的话题选择 仍然保留至少两层的选择)可以推入一个查询。 它应该是这样的:

/* 
    Given %T1%, %T2%, %T3$, and %T4% from queries above, find all topics %T5% > %T4% 
    with at least 2 stories not related to %T1%, %T2%, %T3%, or %T4% and such that 
    %T1% still has at least 2 stories not related to %T2%, %T3%, %T4%, or %T5% and 
    %T2% still has at least 2 stories not related to %T1%, %T3%, %T4%, or %T5% and 
    %T3% still has at least 2 stories not related to %T1%, %T2%, %T4%, or %T5% and 
    %T4% still has at least 2 stories not related to %T1%, %T2%, %T3%, or %T5% 
*/ 

SELECT TS5.TopicID, Count(*) 
From TopicStory TS5 
Where TS5.TopicID > %T4% 
    and NOT EXISTS (SELECT * 
         From TopicStory TSX 
         Where TSX.TopicID in (%T1%,%T2%,%T3%,%T4%) 
         and TSX.StoryID = TS5.StoryID 
        ) 
    and (SELECT Count(*) 
      From TopicStory TZ1 
      Where TZ1.TopicID = %T1% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX1 
           Where TX1.StoryID = TZ1.StoryID 
           and TX1.TopicID in (%T2%,%T3%,%T4%,TS5.TopicID) 
          ) 
     ) > 1 
    and (SELECT Count(*) 
      From TopicStory TZ2 
      Where TZ2.TopicID = %T2% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX2 
           Where TX2.StoryID = TZ2.StoryID 
           and TX2.TopicID in (%T1%,%T3%,%T4%,TS5.TopicID) 
          ) 
     ) > 1 
    and (SELECT Count(*) 
      From TopicStory TZ3 
      Where TZ3.TopicID = %T3% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX3 
           Where TX3.StoryID = TZ3.StoryID 
           and TX3.TopicID in (%T1%,%T2%,%T4%,TS5.TopicID) 
          ) 
     ) > 1 
    and (SELECT Count(*) 
      From TopicStory TZ4 
      Where TZ4.TopicID = %T4% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX3 
           Where TX3.StoryID = TZ3.StoryID 
           and TX3.TopicID in (%T1%,%T2%,%T3%,TS5.TopicID) 
          ) 
     ) > 1 
Group By TS5.TopicID 
Having Count(*) > 1 

的功能集的MySQL是一个进展中的工作,所以可以想象在存储过程中的高效 实现是可能的,其中光标可以采取主题列出的 作用。然而,如果 “游标”是外部管理列表(例如PHP),并且数据库查询 保持尽可能简单,我会对自己的良好业绩充满信心。