2014-09-28 87 views
0

我遇到了一个猪脚本的问题,我尝试了很多不同的方法。任何人都可以指出我究竟做错了什么?它应该是非常简单的,我试图在计算平均值后得到最大值。pigscript error not calculation max

a = LOAD 'default.books' using org.apache.hcatalog.pig.HCatLoader(); 
b = LOAD 'default.book_rating' using org.apache.hcatalog.pig.HCatLoader(); 

books_and_ratings = join a by isbn, b by isbn; 

by_isbn = GROUP books_and_ratings BY (a::isbn); 

DESCRIBE by_isbn; 

average_book_rating = FOREACH by_isbn 
     GENERATE books_and_ratings.book_title, books_and_ratings.a::isbn as isbn1, 
     books_and_ratings.book_author, books_and_ratings.publisher, 
     AVG(books_and_ratings.book_rating) as AVG_RATING; 

DESCRIBE average_book_rating; 

group_avg = GROUP average_book_rating ALL; 

DESCRIBE group_avg; 

max_avg_rating = FOREACH group_avg 
    GENERATE FLATTEN average_book_rating.a::book_title, isbn1, 
      average_book_rating.a::book_author, average_book_rating.a::publisher, MAX(AVG_RATING); 

dump max_avg_rating; 

解析失败:不匹配的输入 'average_book_rating' 期待LEFT_PAREN

+0

您是否收到错误,或者只是没有正确计算最大值? – Eyal 2014-09-28 13:56:13

+0

@eyal实际上得到一个错误.... – Hades 2014-09-28 20:24:43

+0

计算max_avg_rating的最后一个stmt不正确。你能粘贴确切的错误吗? – 2014-09-29 00:48:24

回答

2

你可以尝试这样的。

max_avg_rating = ORDER average_book_rating BY AVG_RATING DESC; 
top_most_rating = LIMIT max_avg_rating 1; 
dump top_most_rating; 
0

看到阎王最新评论后(“可以有多种书籍最高平均等级”),我想你需要另一组,第一个,获得通过书号哪些群体的收视率,你想要的东西之后。

开始是这样的:由AVG_RATING

grouped_rating = GROUP average_book_rating;

然后你可以使用像@ Sivasakthi代码:

ordered_avg_rating = ORDER BY grouped_rating DESC组;
top_most_rating = LIMIT ordered_avg_rating 1;
dump top_most_rating;

这样一来,如果有与平等,最高收视多个结果,top_most_rating将所有的信息接受这个最高等级的书袋。当然,如果你不想把它作为一个包,你可以把它设计得更方便些。

UPDATE:

这是我怎么会改变上面的代码。有一件事情不是纯粹的功能,我会首先将评分平均,然后加入书籍/作者信息 - 这会更好地表现明智,否则你会增加评分的大小(其中有很多)时,他们去了。

所以它看起来像这样:

-- assume a: book_title, isbn, book_author, publisher (and maybe more, which we'll ignore) 
    a = LOAD 'default.books' using org.apache.hcatalog.pig.HCatLoader(); 

    -- assume b: isbn, book_rating (and maybe more, which we'll ignore) 
    b = LOAD 'default.book_rating' using org.apache.hcatalog.pig.HCatLoader(); 

    by_isbn = GROUP b BY isbn; 

    average_book_rating = FOREACH by_isbn GENERATE AVG(b.book_rating) AS AVG_RATING, group AS isbn; 

    group_avg = GROUP average_book_rating BY AVG_RATING; 

    ordered_avg_rating = ORDER group_avg BY group DESC; 

    top_most_rating = LIMIT ordered_avg_rating 1; 

    b = FOREACH top_most_rating GENERATE flatten(average_book_rating); 

    -- now add the book information 

    books_and_ratings = JOIN a BY isbn, b BY isbn; 

    books_and_ratings = FOREACH books_and_ratings GENERATE a::book_title AS title, a::isbn AS isbn, a::book_author AS author,a::publisher AS publisher, b::average_book_rating::AVG_RATING AS max_rating; 

希望这个作品送给你。

+0

谢谢你的回答,我编辑了我的问题...你可以看看,看看什么它错了吗? – Hades 2014-10-05 14:47:27

+0

在您编辑的代码中(在您的原始描述中),按ALL分组,而不是按照我的答案中的AVG_RATING分组。这意味着所有的行将被分组到一个袋子里。我仍然不确定你想要做什么,但FLATTEN使用圆括号,这是你得到错误的直接原因。我的答案中的代码确实会为您带来一包所有获得最高平均评级的书籍。 – Eyal 2014-10-07 14:49:29

+0

我只想要最高的平均值....正确我的代码,我会给你赏金 – Hades 2014-10-08 02:51:54

相关问题