英文:
pyspark log4j2: How to log full exception stack trace?
问题
我尝试过
logger.error('err', e)
logger.error('err', exc_info=e) # python的日志语法
>>>
>>> logger = spark.sparkContext._jvm.org.apache.log4j.LogManager.getLogger('my-logger')
>>>
>>> try: 1/0
... except Exception as e: logger.error('err', e)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1313, in __call__
File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1283, in _build_args
File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1283, in <listcomp>
File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 298, in get_command_part
AttributeError: 'ZeroDivisionError' object has no attribute '_get_object_id'
>>>
>>>
>>>
>>> try: 1/0
... except Exception as e: logger.error('err', exc_info=e)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: __call__() got an unexpected keyword argument 'exc_info'
>>>
当然,我可以将堆栈跟踪自己转换并作为字符串传递给log4j,而不是异常对象。但如果可以避免,我就不想这样做。
>>> try: 1/0
... except Exception as e: l.error(f'err {" ".join(traceback.TracebackException.from_exception(e).format())}')
...
23/03/09 11:38:47 ERROR my-logger: err Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>>
英文:
I tried
logger.error('err', e)
logger.error('err', exc_info=e) # syntax for python's logging
>>>
>>> logger = spark.sparkContext._jvm.org.apache.log4j.LogManager.getLogger('my-logger')
>>>
>>> try: 1/0
... except Exception as e: logger.error('err', e)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1313, in __call__
File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1283, in _build_args
File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1283, in <listcomp>
File "/home/kash/project1/.venv/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 298, in get_command_part
AttributeError: 'ZeroDivisionError' object has no attribute '_get_object_id'
>>>
>>>
>>>
>>> try: 1/0
... except Exception as e: logger.error('err', exc_info=e)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: __call__() got an unexpected keyword argument 'exc_info'
>>>
of course I can convert the stacktrace myself and pass it as a string to log4j instead of exception object. But don't wanna do all that if I can avoid it.
>>> try: 1/0
... except Exception as e: l.error(f'err {"".join(traceback.TracebackException.from_exception(e).format())}')
...
23/03/09 11:38:47 ERROR my-logger: err Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>>
答案1
得分: 1
tl;dr - 我们无法在py4j函数调用中传递纯Python对象,因为该对象在目标JVM中不存在。
AttributeError: 'ZeroDivisionError' object has no attribute '_get_object_id'
错误的原因是:org.apache.logging.log4j.Logger.info/err()
期望传递一个 Java异常对象。
为此需要执行以下操作:
- 异常应该起源于JVM中(或参见下面的注释)
- 您的Python代码应该获取该Java异常对象的句柄,作为Python包装对象(
py4j.java_gateway.JavaObject
)。 - 将该包装对象作为参数传递给
Logger.info/error()
。
注意:如果存在纯Python的Error/Exception对象,它起源于Python运行时,那么您必须在JVM中创建相应的Java异常对象,然后将一个句柄(包装器 py4j.java_gateway.JavaObject
)传递给 Logger.info/error()
。
我们最终将它作为日志消息的一部分以字符串形式传递。
>>>
>>> import traceback
>>> def e2s(ex: Exception):
... return {''.join(traceback.TracebackException.from_exception(ex).format())}
...
>>>
>>> l = spark.sparkContext._jvm.org.apache.log4j.LogManager.getLogger('my_logger')
>>>
>>> try: 1/0
... except Exception as e: l.error(f'Some exception: {e2s(e)}')
...
ERROR Some exception: {'Traceback (most recent call last):\n File "<stdin>", line 1, in <module>\nZeroDivisionError: division by zero\n'}
>>>
英文:
tl;dr - We can not pass a pure python object in a py4j function call, because that object doesn't exist in target JVM.
Reason for AttributeError: 'ZeroDivisionError' object has no attribute '_get_object_id'
error is that: org.apache.logging.log4j.Logger.info/err()
expects a Java exception object to be passed to it.
For that to happen:
- the exception should originate in the JVM (or see note below)
- your python code should get a handle to that Java exception object as a python wrapper object (
py4j.java_gateway.JavaObject
). - pass that wrapper object as a param to
Logger.info/error()
NOTE: If there is a pure python Error/Exception object that originated in python's runtime, then you must create a corresponding Java Exception object in JVM, then pass a handle (wrapper py4j.java_gateway.JavaObject
) to that object to Logger.info/error()
.
We ended up just passing it as a string as part of the log message.
>>>
>>> import traceback
>>> def e2s(ex: Exception):
... return {''.join(traceback.TracebackException.from_exception(ex).format())}
...
>>>
>>> l = spark.sparkContext._jvm.org.apache.log4j.LogManager.getLogger('my_logger')
>>>
>>> try: 1/0
... except Exception as e: l.error(f'Some exception: {e2s(e)}')
...
ERROR Some exception: {'Traceback (most recent call last):\n File "<stdin>", line 1, in <module>\nZeroDivisionError: division by zero\n'}
>>>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论