2017-02-10 83 views
0

我有这样的数据的数据帧:关于计算与数据的总金额时的精度损失帧

unit,sensitivity currency,trading desk ,portfolio  ,issuer  ,bucket ,underlying ,delta  ,converted sensitivity 
ES ,USD     ,EQ DERIVATIVES,ESEQRED_LH_MIDX ,5GOY   ,5  ,repo  ,0.00002  ,0.00002 
ES ,USD     ,EQ DERIVATIVES,IND_GLOBAL1  ,no_localizado ,8  ,repo  ,-0.16962  ,-0.15198 
ES ,EUR     ,EQ DERIVATIVES,ESEQ_UKFLOWN ,IGN2   ,8  ,repo  ,-0.00253  ,-0.00253 
ES ,USD     ,EQ DERIVATIVES,BASKETS1  ,9YFV   ,5  ,spot  ,-1003.64501 ,-899.24586 

,我有过这样的数据做一个聚合操作,做这样的事情:

val filteredDF = myDF.filter("unit = 'ES' AND `trading desk` = 'EQ DERIVATIVES' AND issuer = '5GOY' AND bucket = 5 AND underlying = 'repo' AND portfolio ='ESEQRED_LH_MIDX'") 
        .groupBy("unit","trading desk","portfolio","issuer","bucket","underlying") 
        .agg(sum("converted_sensitivity")) 

但我看到,我所聚集的总和失去精度,所以我怎么能知道这一点的“converted_sensitivity”每个值做的和操作之前被转换为BigDecimal(25,5)在新的聚合列上?

非常感谢。

+0

您可以执行映射操作来首先计算列的BigDecimal版本,然后在下一个操作中添加它们。我想这将介于.groupBy和.agg – Paul

回答

1

为了确保转换,您可以在DataFrame中使用DecimalType

根据火花文档DecimalType是:

表示java.math.BigDecimal的值的数据类型。必须具有固定精度(最大位数)和缩放(点右侧的位数)的十进制数。 精度可以达到38,比例也可以达到38(小于或等于精度)。 默认的精度和比例是(10,0)。

你可以看到这个here

要转换数据,您可以使用Column对象的功能cast。像这样:

import org.apache.spark.sql.types.DecimalType 

val filteredDF = myDF.filter("unit = 'ES' AND `trading desk` = 'EQ DERIVATIVES' AND issuer = '5GOY' AND bucket = 5 AND underlying = 'repo' AND portfolio ='ESEQRED_LH_MIDX'") 
       .withColumn("new_column_big_decimal", col("converted_sensitivity").cast(DecimalType(25,5)) 
       .groupBy("unit","trading desk","portfolio","issuer","bucket","underlying") 
       .agg(sum("new_column_big_decimal")) 
+1

完美@Thiago,正是我需要知道的,非常感谢! – aironman