应用 Java 函数 URLDecoder.decode 到 Spark 3 中的整个列。

huangapple go评论158阅读模式
英文:

Apply java funciton URLDecoder.decode to whole column in Spark 3

问题

我有一个包含URL编码字符串的数据帧列,例如:

我想要做这样的事情:

  1. someDF.withColumn('newcol', URLDecoder.decode(col("mystring"), "utf-8"))
  2. someDF.show()
  3. | mystring | newcol |
  4. --------------------------------------------------
  5. | ThisIs%201rstString | ThisIs 1rstString |
  6. | This%20is%3Ethisone | This is>thisone |
  7. | and%20so%20one | and so one |

我应该如何做到这一点?我猜map函数可能在附近,但无法弄清如何使用它。

注意:这只是一个示例,不可能创建多个替换语句,因为还有许多其他编码字符,列表可能会有所变化,我想使用一个简单可靠的方法来做到这一点。

英文:

I have a dataframe column containing url encoded string such as:

I would like to do something like that:

  1. someDF.withColumn('newcol', URLDecoder.decode( col("mystring"), "utf-8" ))
  2. someDF.show()
  3. | mystring | newcol |
  4. --------------------------------------------------
  5. | ThisIs%201rstString | ThisIs 1rstString |
  6. | This%20is%3Ethisone | This is>thisone |
  7. | and%20so%20one | and so one |

How should I do such thing I guess map function is around the corner but can't firgure out how to us it.

Note: this is a sample and it is not an option to create multiple replace statement as there is many other encoded characters and list may vary, I'd like to use a simple reliable method to do so.

答案1

得分: 8

你可以尝试使用SparkSQL内置函数reflect

> reflect(class, method[, arg1[, arg2 ..]]) - 使用反射调用方法。

  1. df = spark.createDataFrame([(e,) for e in ["ThisIs%201rstString", "This%20is%3Ethisone", "and%20so%20one"]], ["mystring"])
  2. df.selectExpr("*", "reflect('java.net.URLDecoder','decode', mystring, 'utf-8') as newcol").show()
  3. +-------------------+-----------------+
  4. | mystring| newcol|
  5. +-------------------+-----------------+
  6. |ThisIs%201rstString|ThisIs 1rstString|
  7. |This%20is%3Ethisone| This is>thisone|
  8. | and%20so%20one| and so one|
  9. +-------------------+-----------------+

注意: 上述代码是Python代码,您也可以使用Scala实现相同功能。

英文:

You can try the SparkSQL builtin function reflect:

> reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.

  1. df = spark.createDataFrame([(e,) for e in ["ThisIs%201rstString", "This%20is%3Ethisone", "and%20so%20one"]], ["mystring"])
  2. df.selectExpr("*", "reflect('java.net.URLDecoder','decode', mystring, 'utf-8') as newcol").show()
  3. +-------------------+-----------------+
  4. | mystring| newcol|
  5. +-------------------+-----------------+
  6. |ThisIs%201rstString|ThisIs 1rstString|
  7. |This%20is%3Ethisone| This is>thisone|
  8. | and%20so%20one| and so one|
  9. +-------------------+-----------------+

Note: the above is Python code, you should be able to do the same with Scala.

答案2

得分: 1

创建一个UDF,执行以下操作

  1. import java.net.URLDecoder
  2. def decode(in: String) = URLDecoder.decode(in, "utf-8")
  3. val decode_udf = udf(decode(_))
  4. df.withColumn("newcol", decode_udf('mystring)).show()

打印预期结果。

英文:

Create a UDF that performs the work

  1. import java.net.URLDecoder
  2. def decode(in:String) = URLDecoder.decode(in, "utf-8")
  3. val decode_udf = udf(decode(_))
  4. df.withColumn("newcol", decode_udf('mystring)).show()

prints the expected result.

huangapple
  • 本文由 发表于 2020年9月9日 23:26:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/63814833.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定