pyspark log4j2: 如何记录完整的异常堆栈跟踪?

huangapple go评论71阅读模式
英文:

pyspark log4j2: How to log full exception stack trace?

问题

我尝试过

  • logger.error('err', e)
  • logger.error('err', exc_info=e) # python的日志语法
>>> 
>>> logger = spark.sparkContext._jvm.org.apache.log4j.LogManager.getLogger('my-logger')
>>> 
>>> try: 1/0
... except Exception as e: logger.error('err', e)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1313, in __call__
  File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1283, in _build_args
  File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1283, in <listcomp>
  File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 298, in get_command_part
AttributeError: 'ZeroDivisionError' object has no attribute '_get_object_id'
&gt;&gt;&gt; 
&gt;&gt;&gt; 
&gt;&gt;&gt; 
&gt;&gt;&gt; try: 1/0
... except Exception as e: logger.error('err', exc_info=e)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: __call__() got an unexpected keyword argument 'exc_info'
&gt;&gt;&gt; 

当然,我可以将堆栈跟踪自己转换并作为字符串传递给log4j,而不是异常对象。但如果可以避免,我就不想这样做。

&gt;&gt;&gt; try: 1/0
... except Exception as e: l.error(f'err {" ".join(traceback.TracebackException.from_exception(e).format())}')
... 
23/03/09 11:38:47 ERROR my-logger: err Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

&gt;&gt;&gt; 
英文:

I tried

  • logger.error(&#39;err&#39;, e)
  • logger.error(&#39;err&#39;, exc_info=e) # syntax for python&#39;s logging
&gt;&gt;&gt; 
&gt;&gt;&gt; logger = spark.sparkContext._jvm.org.apache.log4j.LogManager.getLogger(&#39;my-logger&#39;)
&gt;&gt;&gt; 
&gt;&gt;&gt; try: 1/0
... except Exception as e: logger.error(&#39;err&#39;, e)
... 
Traceback (most recent call last):
  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;
ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File &quot;&lt;stdin&gt;&quot;, line 2, in &lt;module&gt;
  File &quot;/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py&quot;, line 1313, in __call__
  File &quot;/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py&quot;, line 1283, in _build_args
  File &quot;/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py&quot;, line 1283, in &lt;listcomp&gt;
  File &quot;/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py&quot;, line 298, in get_command_part
AttributeError: &#39;ZeroDivisionError&#39; object has no attribute &#39;_get_object_id&#39;
&gt;&gt;&gt; 
&gt;&gt;&gt; 
&gt;&gt;&gt; 
&gt;&gt;&gt; try: 1/0
... except Exception as e: logger.error(&#39;err&#39;, exc_info=e)
... 
Traceback (most recent call last):
  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;
ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File &quot;&lt;stdin&gt;&quot;, line 2, in &lt;module&gt;
TypeError: __call__() got an unexpected keyword argument &#39;exc_info&#39;
&gt;&gt;&gt; 

of course I can convert the stacktrace myself and pass it as a string to log4j instead of exception object. But don't wanna do all that if I can avoid it.

&gt;&gt;&gt; try: 1/0
... except Exception as e: l.error(f&#39;err {&quot;&quot;.join(traceback.TracebackException.from_exception(e).format())}&#39;)
... 
23/03/09 11:38:47 ERROR my-logger: err Traceback (most recent call last):
  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;
ZeroDivisionError: division by zero

&gt;&gt;&gt; 

答案1

得分: 1

tl;dr - 我们无法在py4j函数调用中传递纯Python对象,因为该对象在目标JVM中不存在。


AttributeError: 'ZeroDivisionError' object has no attribute '_get_object_id' 错误的原因是:org.apache.logging.log4j.Logger.info/err() 期望传递一个 Java异常对象

为此需要执行以下操作:

  1. 异常应该起源于JVM中(或参见下面的注释)
  2. 您的Python代码应该获取该Java异常对象的句柄,作为Python包装对象(py4j.java_gateway.JavaObject)。
  3. 将该包装对象作为参数传递给 Logger.info/error()

注意:如果存在纯Python的Error/Exception对象,它起源于Python运行时,那么您必须在JVM中创建相应的Java异常对象,然后将一个句柄(包装器 py4j.java_gateway.JavaObject)传递给 Logger.info/error()


我们最终将它作为日志消息的一部分以字符串形式传递。

&gt;&gt;&gt; 
&gt;&gt;&gt; import traceback
&gt;&gt;&gt; def e2s(ex: Exception):
...     return {&#39;&#39;.join(traceback.TracebackException.from_exception(ex).format())}
... 
&gt;&gt;&gt; 
&gt;&gt;&gt; l = spark.sparkContext._jvm.org.apache.log4j.LogManager.getLogger(&#39;my_logger&#39;)
&gt;&gt;&gt; 
&gt;&gt;&gt; try: 1/0
... except Exception as e: l.error(f&#39;Some exception: {e2s(e)}&#39;)
... 
ERROR Some exception: {&#39;Traceback (most recent call last):\n  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;\nZeroDivisionError: division by zero\n&#39;}
&gt;&gt;&gt; 
英文:

tl;dr - We can not pass a pure python object in a py4j function call, because that object doesn't exist in target JVM.


Reason for AttributeError: &#39;ZeroDivisionError&#39; object has no attribute &#39;_get_object_id&#39; error is that: org.apache.logging.log4j.Logger.info/err() expects a Java exception object to be passed to it.

For that to happen:

  1. the exception should originate in the JVM (or see note below)
  2. your python code should get a handle to that Java exception object as a python wrapper object (py4j.java_gateway.JavaObject).
  3. pass that wrapper object as a param to Logger.info/error()

NOTE: If there is a pure python Error/Exception object that originated in python's runtime, then you must create a corresponding Java Exception object in JVM, then pass a handle (wrapper py4j.java_gateway.JavaObject) to that object to Logger.info/error().


We ended up just passing it as a string as part of the log message.

&gt;&gt;&gt; 
&gt;&gt;&gt; import traceback
&gt;&gt;&gt; def e2s(ex: Exception):
...     return {&#39;&#39;.join(traceback.TracebackException.from_exception(ex).format())}
... 
&gt;&gt;&gt; 
&gt;&gt;&gt; l = spark.sparkContext._jvm.org.apache.log4j.LogManager.getLogger(&#39;my_logger&#39;)
&gt;&gt;&gt; 
&gt;&gt;&gt; try: 1/0
... except Exception as e: l.error(f&#39;Some exception: {e2s(e)}&#39;)
... 
ERROR Some exception: {&#39;Traceback (most recent call last):\n  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;\nZeroDivisionError: division by zero\n&#39;}
&gt;&gt;&gt; 

huangapple
  • 本文由 发表于 2023年3月10日 01:43:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75688238.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定