我有我的df
一个问题,运行星火2.1.0,有从蜂房DB SQL查询创建了几个字符串列,让这个.summary()
:PySpark GROUPBY计数失败,show方法
DataFrame[summary: string, visitorid: string, eventtype: string, ..., target: string]
。
如果我只运行df.groupBy("eventtype").count()
,它的工作原理,我得到DataFrame[eventtype: string, count: bigint]
当节目df.groupBy('eventtype').count().show()
运行,我不断收到:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9040214714346906648.py", line 267, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9040214714346906648.py", line 265, in <module>
exec(code)
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 318, in show
print(self._jdf.showString(n, 20))
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o4636.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 633.0 failed 4 times, most recent failure: Lost task 0.3 in stage 633.0 (TID 19944, ip-172-31-28-173.eu-west-1.compute.internal, executor 440): java.lang.NullPointerException
我不知道什么是错的显示方法(既非的其他列可以工作,而不是我创建的事件列target
)。集群的管理员也无法帮助我。
任何指针
我假设你正在使用Zeppelin。 'z.show(df.groupBy('eventtype')。count())'工作吗? –
是的,我正在使用zeppelin - 有趣的想法!它会引发稍微不同的错误..'Py4JJavaError:调用z:org.apache.zeppelin.spark.ZeppelinContext.showDF时发生错误。 :org.apache.zeppelin.interpreter.InterpreterException:java.lang.reflect.InvocationTargetException'我应该编辑我的Q并添加整个错误消息吗? –