2016-03-01 79 views
3

下面是我在火花外壳中运行的程序,但是当我将输出保存在HDFS中时,我正在使用compactbuffer.how获取输出以除去火花输出中的紧凑缓冲。如何清除火花输出中的紧凑缓冲

计划:

val a=sc.textFile("/datagen_10.txt") 

val b=a.map(p=>(p.split(",")(1),p.split(2)) 

val c=sc.textFile("/drug.txt") 

val d =c.map(p=>(p.split(",")(1),p.split(",")(0))) 

val e=b.cogroup(d) 

e.saveAsTextfile("/cogroup") 

输出:

(avil,(CompactBuffer(Brandon Buckner, Veda Hopkins, Mara Higgins, Sybill 

Crosby, Ivan Hale),CompactBuffer(1))) 

(metacin,(CompactBuffer(Len Burgess),CompactBuffer(2))) 

(paracetamol,(CompactBuffer(Zia Underwood, Austin Mayer, Tyler Rosales, Alika 

Gilmore),CompactBuffer(3))) 

回答

1

你必须手动创建输出字符串,例如:

e.map{case (k, (xs, ys)) => 
    s"""($k, ((${xs.mkString(",")}), (${ys.mkString(",")}))"""} 
0

尝试像

rdd1.map(rec => (rec._2._1.mkString(""))) //output will be like Brandon Buckner, Veda Hopkins, Mara Higgins, Sybill Crosby, Ivan Hale