英文:
How to use multiple columns as maps for nested dictionary to create a new dataframe column?
问题
我正在尝试使用2个Pyspark DataFrame列作为嵌套字典的输入,以获取新的Pyspark列作为输出。同时,我希望解决方案可以扩展到具有4-5级嵌套字典。
字典的格式如下:
dict_prob={"a":{"x1":"y1","x2":"y2"},"b":{"m1":"n1","m2":"n2"}}
输入列是:
| index | col1 | col2 |
| -------- | -------- | -------- |
| 0 | a | x1 |
| 1 | a | x2 |
| 2 | b | m2 |
需要的输出列是:
| col3 |
| -------- |
| y1 |
| y2 |
| n2 |
我尝试了下面的链接,但这些似乎只适用于单个字典,而不适用于嵌套字典:
- https://stackoverflow.com/questions/42980704/pyspark-create-new-column-with-mapping-from-a-dict
- https://stackoverflow.com/questions/70462865/how-to-use-a-column-value-as-key-to-a-dictionary-in-pyspark
英文:
I am trying to use 2 pyspark dataframe columns as an input to a nested dictionary to get the output as a new pyspark column. Also would want the solution to scale to a nested dictionary with 4-5 levels.
The dictionary is of the form:
dict_prob={"a":{"x1":"y1","x2:y2"},"b":{"m1":"n1","m2":"n2"}}
Input Columns are:
index | col1 | col2 |
---|---|---|
0 | a | x1 |
1 | a | x2 |
2 | b | m2 |
Output Column Needed
col3 |
---|
y1 |
y2 |
n2 |
I tried the below links but these seem to work for a single dictionary and not for a nested dictionary.
https://stackoverflow.com/questions/42980704/pyspark-create-new-column-with-mapping-from-a-dict
https://stackoverflow.com/questions/70462865/how-to-use-a-column-value-as-key-to-a-dictionary-in-pyspark
答案1
得分: 0
对于给定的示例,您可以使用一个简单的udf
:
from pyspark.sql.functions import udf
two_lvls = udf(lambda l1, l2: dict_prob[l1][l2])
df = df.withColumn("col3", two_lvls(df.col1, df.col2))
输出:
df.show()
+-----+----+----+----+
|index|col1|col2|col3|
+-----+----+----+----+
| 0| a| x1| y1|
| 1| a| x2| y2|
| 2| b| m2| n2|
+-----+----+----+----+
英文:
For the given example, you can use a simple udf
:
from pyspark.sql.functions import udf
two_lvls = udf(lambda l1, l2: dict_prob[l1][l2])
df = df.withColumn("col3", two_lvls(df.col1, df.col2))
Output :
df.show()
+-----+----+----+----+
|index|col1|col2|col3|
+-----+----+----+----+
| 0| a| x1| y1|
| 1| a| x2| y2|
| 2| b| m2| n2|
+-----+----+----+----+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论