2012-03-20 86 views
121

我想运行此查询:PostgreSQL的DISTINCT与不同的ORDER BY

SELECT DISTINCT ON (address_id) purchases.address_id, purchases.* 
FROM purchases 
WHERE purchases.product_id = 1 
ORDER BY purchases.purchased_at DESC 

但我得到这个错误:

PG::Error: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

添加address_id为第一ORDER BY表达沉默的错误,但我真的不想在address_id上添加排序。没有address_id订购可以吗?

+0

您的订单子句purchased_at不address_id.Can你让你的问题清楚。 – Teja 2012-03-20 22:01:46

+0

我的订单有购买,因为我想要它,但postgres还要求地址(请参阅错误消息)。 – 2012-03-20 22:03:50

+0

完全解答在这里 - http://stackoverflow.com/questions/9796078/selecting-rows-ordered-by-some-column-and-disctincton-another 感谢http://stackoverflow.com/users/ 268273/mosty-mostacho – 2012-12-21 23:40:39

回答

114

文件说:

DISTINCT ON (expression [, ...]) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).

Official documentation

所以你必须给address_id添加到由顺序。

或者,如果您正在查找包含每个address_id的最新购买产品的整行,并且结果按purchased_at排序,那么您正试图解决最大的每组问题,可以通过以下方法:

一般的解决方案,应该在大多数DBMS的工作:

SELECT t1.* FROM purchases t1 
JOIN (
    SELECT address_id, max(purchased_at) max_purchased_at 
    FROM purchases 
    WHERE product_id = 1 
    GROUP BY address_id 
) t2 
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at 
ORDER BY t1.purchased_at DESC 

更加面向PostgreSQL的解决方案基于香港小轮@的回答:

SELECT * FROM (
    SELECT DISTINCT ON (address_id) * 
    FROM purchases 
    WHERE product_id = 1 
    ORDER BY address_id, purchased_at DESC 
) t 
ORDER BY purchased_at DESC 

问题澄清,扩展并在这里解决:Selecting rows ordered by some column and distinct on another

+36

它的工作原理,但给出错误的顺序。这就是为什么我想摆脱address_id顺序条款 – 2012-03-20 22:12:11

+0

文档是明确的:你不能因为选定的行将是不可预知的 – 2012-03-20 22:12:55

+2

但是可能有另一种方法来选择最新的购买disticnt地址? – 2012-03-20 22:19:17

47

您可以通过address_id在子查询中进行排序,然后按照您希望在外部查询中进行排序。

SELECT * FROM 
    (SELECT DISTINCT ON (address_id) purchases.address_id, purchases.* 
    FROM "purchases" 
    WHERE "purchases"."product_id" = 1 ORDER BY address_id DESC) 
ORDER BY purchased_at DESC 
+2

但是,这会比只是一个查询慢,不是? – 2012-03-20 22:05:34

+2

非常微弱的是。虽然你在原始的'select'中有购买。*,我不认为这是生产代码? – hkf 2012-03-20 22:06:14

+7

我会补充说,新版本的postgres你需要别名子查询。例如:SELECT * FROM(SELECT DISTINCT ON(address_id)purchases.address_id,purchases。* FROM“purchases”WHERE“purchases”。“product_id”= 1 ORDER BY address_id DESC)AS tmp ORDER BY tmp.purchased_at DESC – aembke 2014-06-17 20:38:36

23

一个子查询可以解决这个问题:

SELECT * 
FROM (
    SELECT DISTINCT ON (address_id) * 
    FROM purchases 
    WHERE product_id = 1 
    ) p 
ORDER BY purchased_at DESC; 

ORDER BY领先的词句在DISTINCT ON与列同意,所以不能按订单不同的列在相同的SELECT

SELECT * 
FROM (
    SELECT DISTINCT ON (address_id) * 
    FROM purchases 
    WHERE product_id = 1 
    ORDER BY address_id, purchased_at DESC -- get "latest" row per address_id 
    ) p 
ORDER BY purchased_at DESC; 

如果purchased_at可以NULL,考虑DESC NULLS LAST

只有在子查询,如果你想从每组选择一个特定的行使用附加ORDER BY
相关,与更多的解释:

+0

如果没有匹配的ORDER BY,你不能使用'DISTINCT ON'。第一个查询需要在子查询内部有一个ORDER BY address_id。 – 2017-07-12 18:46:13

+0

@AristotlePagaltzis:但你*可以*。无论你从哪里得到,都是不正确的。你可以在同一查询中使用'DISTINCT ON'而不用'ORDER BY'。在这种情况下,您可以从由“DISTINCT ON”子句定义的每组对等中获取任意行。尝试它或按照上面的链接了解详细信息和手册的链接。同一个查询中的ORDER BY'(同样的'SELECT')不能不同意'DISTINCT ON'。我也解释了这一点。 – 2017-07-13 00:08:23

+0

嗯,你是对的。除非'ORDER BY'被使用“,否则我对”不可预测的“的含义一无所知,因为它对我来说没有任何意义,该功能被实现为能够处理非连续的值集合......但是赢得了'让你可以利用明确的顺序来利用它。烦人。 – 2017-07-13 06:31:43

10

窗口功能可以解决一通:

SELECT DISTINCT ON (address_id) 
    LAST_VALUE(purchases.address_id) OVER wnd AS address_id 
FROM "purchases" 
WHERE "purchases"."product_id" = 1 
WINDOW wnd AS (
    PARTITION BY address_id ORDER BY purchases.purchased_at DESC 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) 
+3

如果有人解释了这个查询,那将会很好。 – Gajus 2017-04-29 10:18:24

+0

@Gajus:简短的解释:它不起作用,只返回不同的'address_id'。原则*可以*工作,但。相关示例:https://stackoverflow.com/a/22064571/939860或https://stackoverflow.com/a/11533808/939860。但是对于手头的问题,有更短的和/或更快的查询。 – 2017-07-17 15:56:04

1

对于使用烧瓶SQLAlchemy的人,这个工作对我来说

from app import db 
from app.models import Purchases 
from sqlalchemy.orm import aliased 
from sqlalchemy import desc 

stmt = Purchases.query.distinct(Purchases.address_id).subquery('purchases') 
alias = aliased(Purchases, stmt) 
distinct = db.session.query(alias) 
distinct.order_by(desc(alias.purchased_at)) 
+0

是的,甚至更容易,我可以使用:'query.distinct(foo).from_self()。order(bar)' – 2018-01-04 14:46:54

+0

@LaurentMeyer你的意思是'Purchases.query'? – reubano 2018-01-08 13:24:31

+0

是的,我的意思是Purchases.query – 2018-01-08 14:14:34

-2

您也可以通过使用GROUP BY子句这样做

SELECT purchases.address_id, purchases.* FROM "purchases" 
    WHERE "purchases"."product_id" = 1 GROUP BY address_id, 
purchases.purchased_at ORDER purchases.purchased_at DESC 
+0

这是不正确的(除非'采购'只有'address_id'和'purchased_at'这两列)。由于有'GROUP BY',你需要使用一个聚合函数来获得每个不用于分组的列的值,所以它们的值都将来自组的不同行,除非你经历了丑陋和低效的体操。这只能通过使用窗口函数而不是“GROUP BY”来解决。 – 2017-07-12 18:10:38