2013-03-13 94 views
0

我们正在考虑开始使用kxen来构建客户数据的逻辑回归模型。我们迄今为止一直使用SAS和R工作室,并且我很难清楚地了解Kxen中使用的K2R软件包的逻辑。KXEN的回归系数

1)如果我想在sql中构建评分函数,如何从Kxen - (beta, intercept)获得回归系数?

得到下面的SQL代码输出(封闭的代码部分):

SELECT $key, $target_variable, CAST((CASE 
    WHEN $target_variable <= -1.32354053933e0 THEN 0.0e0 
    WHEN $target_variable <= -3.245405264555e-1 THEN 0.0e0 
    WHEN $target_variable <= -3.235405393301e-1 THEN (2.283134417281e-3*$target_variable+7.409696685844e-4) 
    WHEN $target_variable <= -2.673812457267e-1 THEN (4.065409082516e-5*$target_variable+1.543635190092e-5) 
    WHEN $target_variable <= -"2.673250302176e-1 THEN (4.057282329758e1*"$target_variable"+1.084841700789e1) 
    ..... [more code here] 
    ELSE 0.0e0 
    END) AS FLOAT) 
AS PROBA0 
into [table_name] 
FROM 
(
    SELECT $key, (2.191922889118e-2 + CAST((CASE 
     WHEN ("predictor1" IS NULL OR "predictor1" = '' ) THEN -6.39011247354e-3 
     WHEN "predictor1" <= -2.432307283e0 THEN -1.541583426389e-1 
     WHEN "predictor1" <= 9.41313103e-1 THEN (9.932069236689e-2*"predictor1"+8.742010175092e-2) 
     WHEN "predictor1" <= 1.696595422e0 THEN (4.169961790129e-2*"predictor1""+2.454336172985e-1) 
     WHEN "predictor1" >= 1.696595402e0 THEN 3.16180997712e-1 
     ELSE -6.39011247354e-3 
    END) AS FLOAT)+ 
CAST((CASE 
    WHEN ("predictor2" IS NULL OR "predictor2" = '' ) THEN 3.937894402762e-3 
    WHEN "predictor2" <= -9.99550198e-1 THEN -2.797353866946e-2 
    WHEN "predictor2" <= -1.27770581e-1 THEN (2.918798485695e-2*"predictor2""+1.201317665409e-3) 
    WHEN "predictor2" <= 3.78487285e-1 THEN (2.547969219572e-2*"predictor2"+6.997428207111e-3) 
    ...... [more code here] 

) AS $target_varialbe FROM [table_name] 
) TMPTABLE0 

预测都inputed WOE变换后并定义为连续变量。

2)当按订单分配订单的客户时,订单是不同的,那么当按概率排序时 - 从分数到概率的转换不是单调的函数?我的目标是为客户分配标准化的分数/概率。

任何人都可以解释一下吗?

回答

1

因为我最终能够找出答案,所以它来: 用于稳健回归的KXEN K2R引擎与SAS或R不完全可比,因为它们使用不同的逻辑。 KXEN回归引擎是基于使用Vapniks定理的结构风险最小化建立的,在计分计算中转化预测因子(评分未标准化),然后在不同分数区间上使用一组逻辑斯谛方程来获得从0到1归一化的目标变量的概率。因此无法提取KXEN的回归系数。同时得分概率并不是严格单调的函数