英文:
Apply java funciton URLDecoder.decode to whole column in Spark 3
问题
我有一个包含URL编码字符串的数据帧列,例如:
我想要做这样的事情:
someDF.withColumn('newcol', URLDecoder.decode(col("mystring"), "utf-8"))
someDF.show()
| mystring | newcol |
--------------------------------------------------
| ThisIs%201rstString | ThisIs 1rstString |
| This%20is%3Ethisone | This is>thisone |
| and%20so%20one | and so one |
我应该如何做到这一点?我猜map
函数可能在附近,但无法弄清如何使用它。
注意:这只是一个示例,不可能创建多个替换语句,因为还有许多其他编码字符,列表可能会有所变化,我想使用一个简单可靠的方法来做到这一点。
英文:
I have a dataframe column containing url encoded string such as:
I would like to do something like that:
someDF.withColumn('newcol', URLDecoder.decode( col("mystring"), "utf-8" ))
someDF.show()
| mystring | newcol |
--------------------------------------------------
| ThisIs%201rstString | ThisIs 1rstString |
| This%20is%3Ethisone | This is>thisone |
| and%20so%20one | and so one |
How should I do such thing I guess map function is around the corner but can't firgure out how to us it.
Note: this is a sample and it is not an option to create multiple replace statement as there is many other encoded characters and list may vary, I'd like to use a simple reliable method to do so.
答案1
得分: 8
你可以尝试使用SparkSQL内置函数reflect:
> reflect(class, method[, arg1[, arg2 ..]]) - 使用反射调用方法。
df = spark.createDataFrame([(e,) for e in ["ThisIs%201rstString", "This%20is%3Ethisone", "and%20so%20one"]], ["mystring"])
df.selectExpr("*", "reflect('java.net.URLDecoder','decode', mystring, 'utf-8') as newcol").show()
+-------------------+-----------------+
| mystring| newcol|
+-------------------+-----------------+
|ThisIs%201rstString|ThisIs 1rstString|
|This%20is%3Ethisone| This is>thisone|
| and%20so%20one| and so one|
+-------------------+-----------------+
注意: 上述代码是Python代码,您也可以使用Scala实现相同功能。
英文:
You can try the SparkSQL builtin function reflect:
> reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.
df = spark.createDataFrame([(e,) for e in ["ThisIs%201rstString", "This%20is%3Ethisone", "and%20so%20one"]], ["mystring"])
df.selectExpr("*", "reflect('java.net.URLDecoder','decode', mystring, 'utf-8') as newcol").show()
+-------------------+-----------------+
| mystring| newcol|
+-------------------+-----------------+
|ThisIs%201rstString|ThisIs 1rstString|
|This%20is%3Ethisone| This is>thisone|
| and%20so%20one| and so one|
+-------------------+-----------------+
Note: the above is Python code, you should be able to do the same with Scala.
答案2
得分: 1
创建一个UDF,执行以下操作
import java.net.URLDecoder
def decode(in: String) = URLDecoder.decode(in, "utf-8")
val decode_udf = udf(decode(_))
df.withColumn("newcol", decode_udf('mystring)).show()
打印预期结果。
英文:
Create a UDF that performs the work
import java.net.URLDecoder
def decode(in:String) = URLDecoder.decode(in, "utf-8")
val decode_udf = udf(decode(_))
df.withColumn("newcol", decode_udf('mystring)).show()
prints the expected result.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论