英文:
In spark (2.4 and above), how to completely "redact" ALL sensitive information
问题
在 (py)spark 2.4 中,可以从事件日志中遮蔽一些敏感信息,例如:
.config("spark.eventLog.enabled", "true") \
.config("spark.eventLog.dir", "hdfs:///tmp/spark-events") \
.config("spark.redaction.regex", os.environ["SPARK_REDACTION_REGEX"]) \
这将从Spark事件日志中删除“所有”信息,至少从SparkListenerEnvironmentUpdate
事件中删除。
然而,在检查事件文件时,仍然存在一些匹配正则表达式的敏感数据未被遮蔽。
例如,在SparkListenerJobStart
事件中。
如何才能“遮蔽”所有事件的所有信息?
英文:
In (py)spark 2.4, it is possible to redact some sensitive informations from the event logs, for exemple:
.config("spark.eventLog.enabled", "true") \
.config("spark.eventLog.dir", "hdfs:///tmp/spark-events") \
.config("spark.redaction.regex", os.environ["SPARK_REDACTION_REGEX"]) \
This would remove "all" informations from the spark event logs, at least from the SparkListenerEnvironmentUpdate
event.
However, when checking the event file, there are still some sensitive data, matching the regex, that are not redacted.
For example, in the SparkListenerJobStart
event.
How would I "redact" ALL the informations, for ALL the events ?
答案1
得分: 1
这在(py)spark 2中似乎是不可能的。然而,这在spark 3.1中已修复。确实,与红action正则表达式匹配的所有变量都已正确编辑。
英文:
It does not seem possible in (py)spark 2. However, This is fixed in spark 3.1. Indeed, all variables matching the redaction regex are correctly redacted.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论