如何使用多列作为嵌套字典的映射,以创建新的数据框列?

huangapple go评论57阅读模式
英文:

How to use multiple columns as maps for nested dictionary to create a new dataframe column?

问题

我正在尝试使用2个Pyspark DataFrame列作为嵌套字典的输入,以获取新的Pyspark列作为输出。同时,我希望解决方案可以扩展到具有4-5级嵌套字典。

字典的格式如下:

dict_prob={"a":{"x1":"y1","x2":"y2"},"b":{"m1":"n1","m2":"n2"}}

输入列是:

| index | col1  | col2 |
| -------- | -------- | -------- |
| 0   | a   | x1 |
| 1   | a  | x2 |
| 2   | b  | m2 |

需要的输出列是:

| col3 |
| -------- |
| y1   | 
| y2 | 
| n2 | 

我尝试了下面的链接,但这些似乎只适用于单个字典,而不适用于嵌套字典:

英文:

I am trying to use 2 pyspark dataframe columns as an input to a nested dictionary to get the output as a new pyspark column. Also would want the solution to scale to a nested dictionary with 4-5 levels.

The dictionary is of the form:
dict_prob={"a":{"x1":"y1","x2:y2"},"b":{"m1":"n1","m2":"n2"}}

Input Columns are:

index col1 col2
0 a x1
1 a x2
2 b m2

Output Column Needed

col3
y1
y2
n2

I tried the below links but these seem to work for a single dictionary and not for a nested dictionary.
https://stackoverflow.com/questions/42980704/pyspark-create-new-column-with-mapping-from-a-dict
https://stackoverflow.com/questions/70462865/how-to-use-a-column-value-as-key-to-a-dictionary-in-pyspark

答案1

得分: 0

对于给定的示例,您可以使用一个简单的udf

from pyspark.sql.functions import udf

two_lvls = udf(lambda l1, l2: dict_prob[l1][l2])

df = df.withColumn("col3", two_lvls(df.col1, df.col2))

输出:

df.show()

+-----+----+----+----+
|index|col1|col2|col3|
+-----+----+----+----+
|    0|   a|  x1|  y1|
|    1|   a|  x2|  y2|
|    2|   b|  m2|  n2|
+-----+----+----+----+
英文:

For the given example, you can use a simple udf :

from pyspark.sql.functions import udf

two_lvls = udf(lambda l1, l2: dict_prob[l1][l2])

df = df.withColumn("col3", two_lvls(df.col1, df.col2))

Output :

df.show()

+-----+----+----+----+
|index|col1|col2|col3|
+-----+----+----+----+
|    0|   a|  x1|  y1|
|    1|   a|  x2|  y2|
|    2|   b|  m2|  n2|
+-----+----+----+----+

huangapple
  • 本文由 发表于 2023年6月5日 13:02:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76403596.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定