英文:
Pyspark create map type colum from a string column
问题
我尝试使用explode
,但我只得到了每个键值对的一个行:
df = df.withColumn("map_col", f.explode(f.split(f.col("data"), " ")))\
.withColumn("key", f.trim(f.split(f.col("data"), ":").getItem(0)))\
.withColumn("value", f.trim(f.split(f.col("data"), ":").getItem(1)))\
.withColumn("map_col", f.create_map(f.col("key"), f.col("value")))
请注意,这是你提供的代码的翻译部分。
英文:
Hi I have a table with a column that is something like this:-
VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13
From this row of data,
The output should be a maptype column:
Data | MapColumn |
---|---|
VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13 | {"VER": "65010000", "DLL":"some_dll", "as":"bcd,2.sc4", "OR":"SCT", "SG":"3", "SLC":"13"} |
I tried using explode but I am getting just one row of each key,value pair
df= df.withColumn("map_col", f.explode(f.split(f.col("data")," ")))\
.withColumn("key", f.trim(f.split(f.col("data"),":").getItem(0)))\
.withColumn("value", f.trim(f.split(f.col("data"),":").getItem(1)))\
.withColumn("map_col", f.create_map(f.col("key"),f.col("value")))
答案1
得分: 1
请注意,您必须确保以下几点:
':'
分隔键/值对,并且它不会出现在其他地方- 单个空格分隔条目,并且它不会出现在其他地方
from pyspark.sql import functions as F
from pyspark.sql import Column
_data = [
('VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13',),
]
df = spark.createDataFrame(_data, ['data', ])
def parse_struct(x: Column) -> Column:
inner_split = F.split(x, pattern=':')
return F.struct(inner_split.getItem(0), inner_split.getItem(1))
split = F.split('data', pattern='\s').alias('split')
map_col = F.map_from_entries(F.transform(split, parse_struct))
df2 = df.withColumn('map_column', map_col)
df2.select('map_column').show(10, False)
+----------------------------------------------------------------------------------+
|map_column |
+----------------------------------------------------------------------------------+
|{VER -> some_ver, DLL -> some_dll, as -> bcd,2.sc4, OR -> SCT, SG -> 3, SLC -> 13}|
+----------------------------------------------------------------------------------+
请注意,代码部分没有进行翻译。
英文:
Note that you have to be certain that:
':'
separates the key/value pair and it doesn't show up anywhere else- single space separates the entries and it doesn't show up anywhere else
from pyspark.sql import functions as F
from pyspark.sql import Column
_data = [
('VER:some_ver DLL:some_dll as:bcd,2.sc4 OR:SCT SG:3 SLC:13',),
]
df = spark.createDataFrame(_data, ['data', ])
def parse_struct(x: Column) -> Column:
inner_split = F.split(x, pattern=':')
return F.struct(inner_split.getItem(0), inner_split.getItem(1))
split = F.split('data', pattern='\s').alias('split')
map_col = F.map_from_entries(F.transform(split, parse_struct))
df2 = df.withColumn('map_column', map_col)
df2.select('map_column').show(10, False)
+----------------------------------------------------------------------------------+
|map_column |
+----------------------------------------------------------------------------------+
|{VER -> some_ver, DLL -> some_dll, as -> bcd,2.sc4, OR -> SCT, SG -> 3, SLC -> 13}|
+----------------------------------------------------------------------------------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论