pyspark：写与reduceByKey

聚集后的文件我的代码看起来是这样的：pyspark：写与reduceByKey

sc = SparkContext("local", "App Name") 
eventRDD = sc.textFile("file:///home/cloudera/Desktop/python/event16.csv") 
outRDDExt = eventRDD.filter(lambda s: "Topic" in s).map(lambda s: s.split('|')) 
outRDDExt2 = outRDDExt.keyBy(lambda x: (x[1],x[2][:-19])) 
outRDDExt3 = outRDDExt2.mapValues(lambda x: 1) 
outRDDExt4 = outRDDExt3.reduceByKey(lambda x,y: x + y) 
outRDDExt4.saveAsTextFile("file:///home/cloudera/Desktop/python/outDir1")

电流输出文件看起来像这样：（（u'Topic”，u'2017/05/08' ） 15）

我想在我的文件是这样的：

u'Topic”，u'2017/05/08' ，15

如何得到上面的输出（即摆脱元组等等从我目前的输出？

来源

2017-05-28 KRey

您可以手动展开元组，并加入所有元素作为字符串

outRDDExt4\ 
.map(lambda row : ",".join([row[0][1],row[0][1],str(row[1])])\ 
.saveAsTextFile("file:///home/cloudera/Desktop/python/outDir1")

来源

2017-05-28 19:09:53 Pushkr

感谢。这工作。 – KRey

你能否接受答案，如果它的工作。 TKS – Pushkr

pyspark：写与reduceByKey

回答

相关问题