开始:
>>> a = [{'ACARS 20170507/20170506085012209001.rcv': 'QU SOUTA8X\r\n.BJSXCXA 060849\r\nM12\r\nFI CX731/AN B-LAN\r\nDT BJS HKG 060849 M63A\r\n- OFF,V01,CX 731 20170506 1,VHHH,OMDB,0833,0849,----, 600', 'ACARS 20170507/20170502020906017001.rcv': 'QU SOUTA8X\r\n.BJSXCXA 020209\r\nM12\r\nFI KA876/AN B-LAB\r\nDT BJS HKG 020209 M11A\r\n- OFF,V01,KA 876 20170502 1,VHHH,ZSPD,0149,0208,----, 294', 'ACARS 20170507/20170505050124358002.rcv': 'QU SOUTA8X\r\n.BKKXCXA 050501\r\nCFD\r\nFI CX690/AN B-LAJ\r\nDT BKK XSP 050501 C10A\r\n- .1/WRN/DBN17D/WN1705050500 261707002SMOKE LAVATORY DET FAULT'}]
>>> rdd = sc.parallelize(a)
得到一个RDD的钥匙:
>>> rdd_k = rdd.flatMap(lambda x: x.keys())
>>> rdd_k.take(3)
['ACARS 20170507/20170506085012209001.rcv', 'ACARS 20170507/20170505050124358002.rcv', 'ACARS 20170507/20170502020906017001.rcv']
得到一个RDD与价值观:
>>> rdd_v = rdd.flatMap(lambda x: x.values())
>>> rdd_v.take(3)
['QU SOUTA8X\r\n.BJSXCXA 060849\r\nM12\r\nFI CX731/AN B-LAN\r\nDT BJS HKG 060849 M63A\r\n- OFF,V01,CX 731 20170506 1,VHHH,OMDB,0833,0849,----, 600', 'QU SOUTA8X\r\n.BKKXCXA 050501\r\nCFD\r\nFI CX690/AN B-LAJ\r\nDT BKK XSP 050501 C10A\r\n- .1/WRN/DBN17D/WN1705050500 261707002SMOKE LAVATORY DET FAULT', 'QU SOUTA8X\r\n.BJSXCXA 020209\r\nM12\r\nFI KA876/AN B-LAB\r\nDT BJS HKG 020209 M11A\r\n- OFF,V01,KA 876 20170502 1,VHHH,ZSPD,0149,0208,----, 294']
邮编两个RDDS,你将有一个元组的RDD,每个元组是一对你的出发词典(键,值):
>>> newRdd = rdd_k.zip(rdd_v)
>>> newRdd.first()
('ACARS 20170507/20170506085012209001.rcv', 'QU SOUTA8X\r\n.BJSXCXA 060849\r\nM12\r\nFI CX731/AN B-LAN\r\nDT BJS HKG 060849 M63A\r\n- OFF,V01,CX 731 20170506 1,VHHH,OMDB,0833,0849,----, 600')
转换为数据帧:
>>> df = newRdd.toDF()
>>> df.show()
+--------------------+--------------------+
| _1| _2|
+--------------------+--------------------+
|ACARS 20170507/20...|QU SOUTA8X
.BJSX...|
|ACARS 20170507/20...|QU SOUTA8X
.BKKX...|
|ACARS 20170507/20...|QU SOUTA8X
.BJSX...|
+--------------------+--------------------+
其返回DF为'ACARS 20170507/20170507235838492001.rcv:串,ACARS 20170507/20170507235911543001.rcv:串,ACARS20170507235933392001分之20170507 .rcv:string,ACARS 20170507/20170507235957177001.rcv:string'值部分显示为'string' –