在Spark(2.4及更高版本)中,如何完全“删除”所有敏感信息。

huangapple go评论72阅读模式
英文:

In spark (2.4 and above), how to completely "redact" ALL sensitive information

问题

在 (py)spark 2.4 中,可以从事件日志中遮蔽一些敏感信息,例如:

        .config("spark.eventLog.enabled", "true") \
        .config("spark.eventLog.dir", "hdfs:///tmp/spark-events") \
        .config("spark.redaction.regex", os.environ["SPARK_REDACTION_REGEX"]) \

这将从Spark事件日志中删除“所有”信息,至少从SparkListenerEnvironmentUpdate事件中删除。

然而,在检查事件文件时,仍然存在一些匹配正则表达式的敏感数据未被遮蔽。

例如,在SparkListenerJobStart事件中。

如何才能“遮蔽”所有事件的所有信息?

英文:

In (py)spark 2.4, it is possible to redact some sensitive informations from the event logs, for exemple:

    .config("spark.eventLog.enabled", "true") \
    .config("spark.eventLog.dir", "hdfs:///tmp/spark-events") \
    .config("spark.redaction.regex", os.environ["SPARK_REDACTION_REGEX"]) \

This would remove "all" informations from the spark event logs, at least from the SparkListenerEnvironmentUpdate event.

However, when checking the event file, there are still some sensitive data, matching the regex, that are not redacted.

For example, in the SparkListenerJobStart event.

How would I "redact" ALL the informations, for ALL the events ?

答案1

得分: 1

这在(py)spark 2中似乎是不可能的。然而,这在spark 3.1中已修复。确实,与红action正则表达式匹配的所有变量都已正确编辑。

英文:

It does not seem possible in (py)spark 2. However, This is fixed in spark 3.1. Indeed, all variables matching the redaction regex are correctly redacted.

huangapple
  • 本文由 发表于 2023年3月9日 22:01:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75685642.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定