Pyspark 从字符串列创建映射类型列

huangapple go评论50阅读模式
英文:

Pyspark create map type colum from a string column

问题

我尝试使用explode,但我只得到了每个键值对的一个行:

df = df.withColumn("map_col", f.explode(f.split(f.col("data"), " ")))\
       .withColumn("key", f.trim(f.split(f.col("data"), ":").getItem(0)))\
       .withColumn("value", f.trim(f.split(f.col("data"), ":").getItem(1)))\
       .withColumn("map_col", f.create_map(f.col("key"), f.col("value")))

请注意,这是你提供的代码的翻译部分。

英文:

Hi I have a table with a column that is something like this:-

VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13

From this row of data,

The output should be a maptype column:

Data MapColumn
VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13 {"VER": "65010000", "DLL":"some_dll", "as":"bcd,2.sc4", "OR":"SCT", "SG":"3", "SLC":"13"}

I tried using explode but I am getting just one row of each key,value pair

df= df.withColumn("map_col", f.explode(f.split(f.col("data")," ")))\
                         .withColumn("key", f.trim(f.split(f.col("data"),":").getItem(0)))\
                         .withColumn("value", f.trim(f.split(f.col("data"),":").getItem(1)))\
                         .withColumn("map_col", f.create_map(f.col("key"),f.col("value")))  

答案1

得分: 1

请注意,您必须确保以下几点:

  1. ':' 分隔键/值对,并且它不会出现在其他地方
  2. 单个空格分隔条目,并且它不会出现在其他地方
from pyspark.sql import functions as F
from pyspark.sql import Column

_data = [
    ('VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13',),
]
df = spark.createDataFrame(_data, ['data', ])


def parse_struct(x: Column) -> Column:
    inner_split = F.split(x, pattern=':')
    return F.struct(inner_split.getItem(0), inner_split.getItem(1))


split = F.split('data', pattern='\s').alias('split')
map_col = F.map_from_entries(F.transform(split, parse_struct))
df2 = df.withColumn('map_column', map_col)
df2.select('map_column').show(10, False)

+----------------------------------------------------------------------------------+
|map_column                                                                        |
+----------------------------------------------------------------------------------+
|{VER -> some_ver, DLL -> some_dll, as -> bcd,2.sc4, OR -> SCT, SG -> 3, SLC -> 13}|
+----------------------------------------------------------------------------------+

请注意,代码部分没有进行翻译。

英文:

Note that you have to be certain that:

  1. ':' separates the key/value pair and it doesn't show up anywhere else
  2. single space separates the entries and it doesn't show up anywhere else
from pyspark.sql import functions as F
from pyspark.sql import Column

_data = [
    ('VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13',),
]
df = spark.createDataFrame(_data, ['data', ])


def parse_struct(x: Column) -> Column:
    inner_split = F.split(x, pattern=':')
    return F.struct(inner_split.getItem(0), inner_split.getItem(1))


split = F.split('data', pattern='\s').alias('split')
map_col = F.map_from_entries(F.transform(split, parse_struct))
df2 = df.withColumn('map_column', map_col)
df2.select('map_column').show(10, False)

+----------------------------------------------------------------------------------+
|map_column                                                                        |
+----------------------------------------------------------------------------------+
|{VER -> some_ver, DLL -> some_dll, as -> bcd,2.sc4, OR -> SCT, SG -> 3, SLC -> 13}|
+----------------------------------------------------------------------------------+

huangapple
  • 本文由 发表于 2023年7月20日 19:56:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76729604.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定