2014-08-27 379 views
3

我有一些相当简单的SQL,应该为每个asset1每季度提供1行。相反,我通过每组获得多行。SAS proc sql按变量返回重复值/按顺序排序

下面是SQL,SAS数据步骤和一些输出数据。重复行数(在下面的数据中,227708)等于Num_borrowers,它是asset1的行数。

proc sql outobs=max; 

create table table1 as 
select 
    case 
     when period_dt ='01DEC2003'd then '2003Q4' 
     when period_dt ='01DEC2004'd then '2004Q4' 
     when period_dt ='01DEC2005'd then '2005Q4' 
     when period_dt ='01DEC2006'd then '2006Q4' 
     when period_dt ='01DEC2007'd then '2007Q4' 
     when period_dt ='01DEC2008'd then '2008Q4' 
     when period_dt ='01DEC2009'd then '2009Q4' 
     when period_dt ='01DEC2010'd then '2010Q4' 
     when period_dt ='01DEC2011'd then '2011Q4' 
     when period_dt ='01DEC2012'd then '2012Q4' 
     when period_dt ='01DEC2013'd then '2013Q4' 
     when period_dt ='01JUN2014'd then '2014Q2' 
    end as QTR, 
    case 
     when MM_ASSET in ('C&I', 'Foreign', 'Leasing','Scored-WF','Scored-WB') THEN 'C&I' 
     when MM_ASSET='Construction' THEN 'Construction RE' 
     when MM_ASSET='Mortgage-IP' THEN 'Income Producing RE' 
     when MM_ASSET='Mortgage-OO' THEN 'Owner Occupied RE' 
     when MM_ASSET='Mortgage-SF' THEN 'Mortgage-SF' 
     when MM_ASSET='Unknown' THEN 'Other' 
    end as asset1, 
    count (period_dt) as Num_Borrowers, 
    exposure, 
    co_itd, 
    MM_NINEQTR_LOSS, 
    MM_LIFE_LOSS 
    from td_prod.OBLIGOR_COMBINED 
    where period_dt in ('01DEC2003'd,'01DEC2004'd,'01DEC2005'd,'01DEC2006'd,'01DEC2007'd,'01DEC2008'd, '01DEC2009'd,'01DEC2010'd,'01DEC2011'd,'01DEC2012'd,'01DEC2013'd,'01JUN2014'd) 
    and mm_asset in ('C&I','Foreign','Leasing','Construction','Mortgage-IP','Scored-WF','Scored-WB' 
       'Mortgage-OO','Mortgage-SF','Unknown') 
    group by 1,2 
    order by 1,2; 

quit; 



data table2; set table1; 

    Total_Exposure = exposure/1000000; 
    if total_exposure = 0 then total_exposure=.; 
    Total_Charge_Offs =co_itd/1000000; 
    Total_9Q_Losses = MM_NINEQTR_LOSS/1000000; 
    Total_Life_Losses = MM_LIFE_LOSS/1000000; 
    avg_borrower_exp = total_exposure/num_borrowers; 
    co_rate = total_charge_offs/total_exposure; 
    life_lossR = Total_life_losses/total_exposure; 
    nineQtr_lossR = total_9q_losses/total_exposure; 

run; 



*** sample of output data set ***; 
qtr    asset1  num_borrowers 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
2003Q4   C&I    227708 
+0

您使用此哪些数据库remerging数据部分?似乎MySQL,你正在选择更多的非聚合形式的领域比你分组。您的数据库如何知道如何处理多种曝光组合, co_itd, MM_NINEQTR_LOSS, MM_LIFE_LOSS ? – Twelfth 2014-08-27 23:38:11

+4

假定使用SAS,根据标记和PROC SQL。当您在select语句中添加一列时,SAS SQL执行一个奇怪的非ANSI标准事情,该语句不是按列分组,而不是从集合函数计算的。它返回源表中的每条记录。并且在日志中给你一个注释,“这个步骤需要重新合并”或者这个效果。有时这种淹没是有帮助的。但是我知道真正的S​​QL专家不喜欢这个功能,并且避免了它像鼠疫一样。许多数据库会从这样的SELECT语句中引发错误。 – Quentin 2014-08-28 00:06:52

+0

呃,你的意思是MySQL不是唯一那样做,SAS SQl也能做到这一点?哎呀!!!!!!这些语言做错了什么,而不是返回错误?在任何情况下,解决方案=找出如何处理您从您提供的样本结果集中排除的4列 – Twelfth 2014-08-28 00:12:39

回答

7

实现我上面的评论更像是一个答案。

在SAS SQL中,在select by语句中包含无关列的group by子句的查询中(即,列不是组的一部分,也不是从聚合函数派生的),SAS会将​​汇总统计数据“重新汇入”到原始数据(附有相应的说明)。大多数SQL只会引发错误。下面是一个例子:

data have; 
    input gender $ age score; 
    cards; 
M 10 100 
M 20 200 
F 30 300 
F 40 400 
; 
run; 

proc sql; 
    select gender, mean(age) as AvgAge, SCore 
    from have 
    group by gender 
    ; 
quit; 

回报:

gender  AvgAge  score 
F    35  300 
F    35  400 
M    15  100 
M    15  200 

在你的代码,曝光,co_itd,MM_NINEQTR_LOSS和MM_LIFE_LOSS都是多余的栏,导致SAS重新emerge。

每当remerging occures你会看到在SAS日志以下消息:

注:查询需要与原 数据remerging汇总 统计数据发回。

SAS documentation on summary-function更多细节

+5

要禁止重新合并行为,请使用NOREMERGE proc sql选项或NOSQLREMERGE系统选项。 – vasja 2015-07-24 11:24:03