在PySpark中添加字符到字符计数。

huangapple go评论73阅读模式
英文:

add character at character count in pyspark

问题

寻求在pyspark字符串的特定字符计数处插入特殊字符 -

"M202876QC0581AADMM01"
变成
"M-202876-QC0581-AA-DMM01"

(1-6-6-2-)
在第1个字符后插入,然后在第6个字符后插入,然后在第6个字符后插入,然后在第2个字符后插入

尝试类似以下的方法,但尚未成功。

df = spark.createDataFrame([('M202876QC0581AADMM01',)], ['str'])
(df.withColumn("str", F.regexp_replace(F.col("str") ,  r"(\d{0})(\d{3})(\d{3})" , "$1-$2-$3"))).collect()

Out[121]: [Row(str='M-202-876QC0581AADMM01')]
英文:

looking to insert special character at specific character count in pyspark string -

"M202876QC0581AADMM01"
to
"M-202876-QC0581-AA-DMM01"

(1-6-6-2-)
insertion after 1char then after 6char then after 6char then after 2char

Tried something like below but no luck yet.

df = spark.createDataFrame([('M202876QC0581AADMM01',)], ['str'])
(df.withColumn("str", F.regexp_replace(F.col("str") ,  r"(\d{0})(\d{3})(\d{3})" , "$1-$2-$3"))).collect()

Out[121]: [Row(str='M-202-876QC0581AADMM01')]

答案1

得分: 1

你接近了,试试这个:

from pyspark.sql.functions import regexp_replace

df = spark.createDataFrame(["M202876QC0581AADMM01"], ["str"])

pat = r"^(.{1})(.{6})(.{6})(.{2})(.+)"
df = df.withColumn("str", regexp_replace("str", pat, r"$1-$2-$3-$4-$5"))

输出

df.show(truncate=False)

+------------------------+
|str                     |
+------------------------+
|M-202876-QC0581-AA-DMM01|
+------------------------+
英文:

You're close, try this :

from pyspark.sql.functions import regexp_replace

df = spark.createDataFrame([("M202876QC0581AADMM01",)], ["str"])

pat = r"^(.{1})(.{6})(.{6})(.{2})(.+)"
df = df.withColumn("str", regexp_replace("str", pat, r"$1-$2-$3-$4-$5"))

Output :

df.show(truncate=False)

+------------------------+
|str                     |
+------------------------+
|M-202876-QC0581-AA-DMM01|
+------------------------+

huangapple
  • 本文由 发表于 2023年4月20日 01:32:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76057335.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定