2010-10-06 93 views
2

嘿大家。我相信这是一个'最大的n-per-group'问题,但即使在查看几个关于StackOverflow的问题之后,我不确定如何将其应用于我的情况...MySQL最大 - 每组麻烦

我正在使用MySQL数据库并有一个基本的博客类型的系统中设置了有关计算机应用......这些表是这样的:

POSTS 
post_id 
post_created 
post_type  -- could be article, review, feature, whatever 
post_status -- 'a' approved or 'd' for draft 

APPS 
app_id 
app_name 
app_platform -- Windows, linux, unix, etc.. 

APP_TO_POST -- links my posts to its relevant application 
atp_id 
atp_app_id 
atp_post_id 

我用下面的基本查询拉了一个名为“Photoshop的应用程序的所有文章其中邮政类型为'物品'且物品的状态为批准的'a':

SELECT apps.app_name, apps.app_platform, posts.post_created, posts.post_id 
FROM apps 
JOIN app_to_post ON app_to_post.atp_app_id = apps.app_id 
JOIN posts ON app_to_post.atp_post_id = posts.post_id 
WHERE apps.app_name = 'Photoshop' 
AND 
posts.post_type = 'Article' 
AND 
posts.post_status = 'a' 

它得到我这些预期结果:

app_name app_platform post_created  post_id 
Photoshop Windows  Oct. 20th, 2009 1 
Photoshop Windows  Dec. 1, 2009  3 
Photoshop Macintosh  Nov. 10th, 2009 2 

会有人能够伸出援助之手,我怎么可能会改变该查询只拉每个应用平台最近的文章?因此,例如,我想我的结果是这样的:

app_name app_platform post_created  post_id 
Photoshop Windows  Dec. 1, 2009  3 
Photoshop Macintosh  Nov. 10th, 2009 2 

,并省略'Photoshop Windows'文章之一,因为它不是最近的一次。

如果我简单地使用MAX(post_created)GROUP BY app_platform,我的结果并不总是正确分组。从我的理解我需要执行子查询的某种内部连接?

回答

0

你在正确的轨道上。

尝试增加

group by app_name,app_platform 
having post_created=max(post_created) 

或者,如果你POST_ID是连续的,其中较高的值将始终反映以后的文章中,使用having子句:having post_id=max(post_id)

+1

我正在通过类似的问题,“有”的声明并没有解决它。 MySQL似乎报告了它为每个分组行找到的第一个结果,并且所有“having”将会完全排除第一个结果与最大结果不匹配的任何行。 – 2011-06-22 15:12:41

4

既然你有足够的JOIN S,我建议创建VIEW第一:

CREATE VIEW articles AS 
    SELECT a.app_name, a.app_platform, p.post_created, p.post_id 
    FROM  apps a 
    JOIN  app_to_post ap ON ap.atp_app_id = a.app_id 
    JOIN  posts p ON ap.atp_post_id = p.post_id 
    WHERE  p.post_type = 'Article' AND p.post_status = 'a'; 

然后你可以使用NULL自加入:

SELECT  a1.app_name, a1.app_platform, a1.post_created, a1.post_id 
FROM  articles a1 
LEFT JOIN articles a2 ON 
      a2.app_platform = a1.app_platform AND a2.post_created > a1.post_created 
WHERE  a2.post_id IS NULL; 

测试用例:

CREATE TABLE posts (
    post_id   int, 
    post_created  datetime, 
    post_type  varchar(30), 
    post_status  char(1) 
); 

CREATE TABLE apps (
    app_id   int, 
    app_name   varchar(40), 
    app_platform  varchar(40) 
); 

CREATE TABLE app_to_post (
    atp_id   int, 
    atp_app_id  int, 
    atp_post_id  int 
); 

INSERT INTO posts VALUES (1, '2010-10-06 05:00:00', 'Article', 'a'); 
INSERT INTO posts VALUES (2, '2010-10-06 06:00:00', 'Article', 'a'); 
INSERT INTO posts VALUES (3, '2010-10-06 07:00:00', 'Article', 'a'); 
INSERT INTO posts VALUES (4, '2010-10-06 08:00:00', 'Article', 'a'); 
INSERT INTO posts VALUES (5, '2010-10-06 09:00:00', 'Article', 'a'); 

INSERT INTO apps VALUES (1, 'Photoshop', 'Windows'); 
INSERT INTO apps VALUES (2, 'Photoshop', 'Macintosh'); 

INSERT INTO app_to_post VALUES (1, 1, 1); 
INSERT INTO app_to_post VALUES (1, 1, 2); 
INSERT INTO app_to_post VALUES (1, 2, 3); 
INSERT INTO app_to_post VALUES (1, 2, 4); 
INSERT INTO app_to_post VALUES (1, 1, 5); 

结果:

+-----------+--------------+---------------------+---------+ 
| app_name | app_platform | post_created  | post_id | 
+-----------+--------------+---------------------+---------+ 
| Photoshop | Macintosh | 2010-10-06 08:00:00 |  4 | 
| Photoshop | Windows  | 2010-10-06 09:00:00 |  5 | 
+-----------+--------------+---------------------+---------+ 
2 rows in set (0.00 sec) 

作为一个侧面说明,一般你不需要为您junction table一个surrogate key。你还不如建立一个复合主键(理想外键引用的表):

CREATE TABLE app_to_post (
    atp_app_id  int, 
    atp_post_id  int, 
    PRIMARY KEY (atp_app_id, atp_post_id), 
    FOREIGN KEY (atp_app_id) REFERENCES apps (app_id), 
    FOREIGN KEY (atp_post_id) REFERENCES posts (post_id) 
) ENGINE=INNODB; 
+0

这个空联接为我解决了类似的问题。 – 2011-06-22 15:18:44

+1

这是一个高效的查询吗?您将所有内容加入到文章中,然后将所有内容加入其中。这对我来说很昂贵。 – marc40000 2012-02-02 01:16:51

2

我们先考虑如何从查询结果中获得最大价值的行和你的理想的结果:

您的结果:(我们称之为表T)

app_name app_platform post_created  post_id 
Photoshop Windows  Oct. 20th, 2009 1 
Photoshop Windows  Dec. 1, 2009  3 
Photoshop Macintosh  Nov. 10th, 2009 2 

结果你想要的:

app_name app_platform post_created  post_id 
Photoshop Windows  Dec. 1, 2009  3 
Photoshop Macintosh  Nov. 10th, 2009 2 

为了得到结果,你应该:

  1. 计算每个平台的最大POST_ID为表T
  2. 加入最大结果与原点表T在该行的其他列获取值。

查询低于:

SELECT 
    t1.app_name,t1.app_platform,t1.post_created,t1.post_id 
FROM 
    (SELECT app_platform, MAX(post_created) As MaxPostCreated 
    FROM T 
    GROUP BY app_platform) AS t2 JOIN 
    T AS t1 
WHERE 
    t1.app_platform = t2.app_platform1 
    AND t2.MaxPostCreated = t1.post_created 

在该查询中,子查询执行的第一步骤,和加入执行第二步骤。

最终的结果与你的部分答案结合低于(有图)表示:

CREATE VIEW T 
    SELECT a.app_name, a.app_platform, p.post_created, p.post_id 
    FROM  apps a 
    JOIN  app_to_post ap ON ap.atp_app_id = a.app_id 
    JOIN  posts p ON ap.atp_post_id = p.post_id 
    WHERE  p.post_type = 'Article' AND p.post_status = 'a'; 

SELECT 
    t1.app_name,t1.app_platform,t1.post_created,t1.post_id 
FROM 
    (SELECT app_platform, MAX(post_created) As MaxPostCreated 
    FROM T 
    GROUP BY app_platform) AS t2 JOIN 
    T AS t1 
WHERE 
    t1.app_platform = t2.app_platform1 
    AND t2.MaxPostCreated= t1.post_created 

顺便说一句,我们的团队实际上是正在开发的工具尝试自动帮助用户编写查询,以及用户可以为该工具提供输入输出示例,该工具将生成一个查询。 (查询的第一部分实际上是由工具生成的!我们原型的链接是https://github.com/Mestway/Scythe

希望这能帮助你。 :)