0
我试图从列值为0但无法这样的数据帧中获取列。有没有人尝试过相同的?从pyspark数据框中获取列,其中值等于0
我试图从列值为0但无法这样的数据帧中获取列。有没有人尝试过相同的?从pyspark数据框中获取列,其中值等于0
## Data Frame with One Row
row = [[1,0,0,1,2,3,4,0,0,0]]
df = sc.parallelize(row).toDF(['Col1','Col2','Col3','Col4','Col5','Col6','Col7','Col8','Col9','Col10'])
df.show()
#Say you have only one row hene we wrote that zero
list_of_dict = map(lambda row: row.asDict(), df.collect())[0]
zeroCol = []
for key in list_of_dict.keys():
if list_of_dict[key] > 0:
zeroCol.append(key)
print zeroCol
+----+----+----+----+----+----+----+----+----+-----+
|Col1|Col2|Col3|Col4|Col5|Col6|Col7|Col8|Col9|Col10|
+----+----+----+----+----+----+----+----+----+-----+
| 1| 0| 0| 1| 2| 3| 4| 0| 0| 0|
+----+----+----+----+----+----+----+----+----+-----+
['Col6', 'Col7', 'Col4', 'Col5', 'Col1']
您是否希望特定列中的所有值都为零?你能更加明确一些吗?或者提供一个你想要做什么的例子? – StackPointer
我有一个单行的df。很少有列是0,很少有大于0的值。我只需要提取那些值大于0的列 –