英文:
add character at character count in pyspark
问题
寻求在pyspark字符串的特定字符计数处插入特殊字符 -
"M202876QC0581AADMM01"
变成
"M-202876-QC0581-AA-DMM01"
(1-6-6-2-)
在第1个字符后插入,然后在第6个字符后插入,然后在第6个字符后插入,然后在第2个字符后插入
尝试类似以下的方法,但尚未成功。
df = spark.createDataFrame([('M202876QC0581AADMM01',)], ['str'])
(df.withColumn("str", F.regexp_replace(F.col("str") , r"(\d{0})(\d{3})(\d{3})" , "$1-$2-$3"))).collect()
Out[121]: [Row(str='M-202-876QC0581AADMM01')]
英文:
looking to insert special character at specific character count in pyspark string -
"M202876QC0581AADMM01"
to
"M-202876-QC0581-AA-DMM01"
(1-6-6-2-)
insertion after 1char then after 6char then after 6char then after 2char
Tried something like below but no luck yet.
df = spark.createDataFrame([('M202876QC0581AADMM01',)], ['str'])
(df.withColumn("str", F.regexp_replace(F.col("str") , r"(\d{0})(\d{3})(\d{3})" , "$1-$2-$3"))).collect()
Out[121]: [Row(str='M-202-876QC0581AADMM01')]
答案1
得分: 1
你接近了,试试这个:
from pyspark.sql.functions import regexp_replace
df = spark.createDataFrame(["M202876QC0581AADMM01"], ["str"])
pat = r"^(.{1})(.{6})(.{6})(.{2})(.+)"
df = df.withColumn("str", regexp_replace("str", pat, r"$1-$2-$3-$4-$5"))
输出:
df.show(truncate=False)
+------------------------+
|str |
+------------------------+
|M-202876-QC0581-AA-DMM01|
+------------------------+
英文:
You're close, try this :
from pyspark.sql.functions import regexp_replace
df = spark.createDataFrame([("M202876QC0581AADMM01",)], ["str"])
pat = r"^(.{1})(.{6})(.{6})(.{2})(.+)"
df = df.withColumn("str", regexp_replace("str", pat, r"$1-$2-$3-$4-$5"))
Output :
df.show(truncate=False)
+------------------------+
|str |
+------------------------+
|M-202876-QC0581-AA-DMM01|
+------------------------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论