0
假设我有一个玩家ID的游戏。每个ID可以有多个角色名称(playerNames),我们对每个名称都有一个评分。我想总计每个playerName的所有分数,并计算每个玩家名称每个id的百分比分数。在猪群结果内循环通过
所以,举例来说:
id playerName playerScore 01 Test 45 01 Test2 15 02 Joe 100
将输出
id {(playerName, playerScore, percentScore)} 01 {(Test, 45, .75), (Test2, 15, .25)} 02 {(Joe, 100, 1.0)}
我是这样做的:
data = LOAD 'someData.data' AS (id:int, playerName:chararray, playerScore:int);
grouped = GROUP data BY id;
withSummedScore = FOREACH grouped GENERATE SUM(data.playerScore) AS summedPlayerScore, FLATTEN(data);
withPercentScore = FOREACH withSummedScore GENERATE data::id AS id, data::playerName AS playerName, (playerScore/summedPlayerScore) AS percentScore;
percentScoreIdroup = GROUP withPercentScore By id;
目前,我这样做有2 GROUP BY语句,我很好奇,如果他们都是必要的,或者如果有更有效的方法来做到这一点。我可以将其减少到单个GROUP BY吗?或者,有没有一种方法可以迭代一堆元组,并将percentScore添加到所有元组中,而不会压扁数据?
这样做很有意义,谢谢TC1 – Newtang