0
我有两个数据集,一个客户端与他们各自的账单,其中包含以下元素:“账单数”,“日期”,“客户端”,导入“和其他数据集是按年龄分组的客户端如何计算每位客户的帐单总数?
****一个例子:****
1st Dataset
u'F1,01/01/2013,C1,11'
2nd Dataset
u'C1,20'
我已经解析了两个数据集,以选择重要的数据,我的题目下面是代码
def parseClients(clients):
fields=clients.split(",")
return (fields[0], fields[1])
def parseBill(bill):
fields=bill.split(",")
return (fields[2], bill)
new_bills=bills.map(parseBill)
new_clients=clients.map(parseClients)
Age_Bills=new_bills.join(new_clients)
样本如下:
Age_Bills.take(10):
(u'C856', (u'F2982,06/01/2013,C856,88', u'81'))
(u'C856', (u'F11953,22/01/2013,C856,87', u'81'))
(u'C856', (u'F12893,24/01/2013,C856,10', u'81'))
(u'C856', (u'F12913,24/01/2013,C856,41', u'81'))
(u'C856', (u'F17883,02/02/2013,C856,45', u'81'))
(u'C856', (u'F17895,02/02/2013,C856,75', u'81'))
(u'C856', (u'F18867,04/02/2013,C856,105', u'81'))
(u'C856', (u'F21864,09/02/2013,C856,26', u'81'))
(u'C856', (u'F30889,26/02/2013,C856,154', u'81'))
(u'C856', (u'F49990,02/04/2013,C856,90', u'81'))
现在我想要计算每个年龄的账单数量 ,但我不知道如何继续。我曾考虑过使用KeyReduce或flatmap。如果你能帮助我,我将不胜感激。
感谢,