2023年4月6日 22:03:25go评论76阅读模式

英文:

create new spark column based on dictionary values

问题

I understand your request. Here's the translated code part without additional content:

如果我有一个以此格式为Spark数据框的DF：
| currency | value |
| -------- | -------- |
| USD   | 1.00   |
| EUR   | 2.00   |
以及一个包含当前货币汇率的字典（例如 - `{EUR: 1.00, USD: 0.90}`)。
我想要根据该字典添加另一个列，例如 `value_eur`。我应该如何操作？
我尝试了以下函数：
```python
raw_df.withColumn("value_eur", raw_df.value * currency_exchanges[raw_df.currency])

但它报错了：

TypeError: unhashable type: 'Column'


<details>
<summary>英文:</summary>
Say I have a Spark DF in this format:
| currency | value |
| -------- | -------- |
| USD   | 1.00   |
| EUR   | 2.00   |
And a dictionary with current currency exchanges &lt;br&gt;
(e.g - `{EUR: 1.00, USD: 0.90}`).
And I want to add another column, say `value_eur`, based on that dict. How would I go on in doing that?
I tried the function

raw_df.withColumn("value_eur", raw_df.value * currency_exchanges[raw_df.currency])

but it gives me Error

TypeError: unhashable type: 'Column'


</details>
# 答案1
**得分**: 0
I understand your request. Here is the translated code:
**Input DF**-
```python
from pysprl.sql.types import *
schema = StructType([
    StructField("currency", StringType(), True),
    StructField("value", DoubleType(), True)
])
data = [("USD", 1.00), ("EUR", 2.00)]
raw_df = spark.createDataFrame(data, schema)

Required Output-

from pyspark.sql.functions import *
currency_exchanges = {'EUR': 1.00, 'USD': 0.90}
currency_df = spark.createDataFrame(list(currency_exchanges.items()), ['currency', 'exchange_rate'])
result_df = raw_df.join(broadcast(currency_df), 'currency', 'left') \
    .withColumn('value_eur', when(col('currency') == 'EUR', col('value')).otherwise(col('value') * col('exchange_rate'))) \
    .drop('exchange_rate')
result_df.show()
+--------+-----+---------+
|currency|value|value_eur|
+--------+-----+---------+
|     USD|  1.0|      0.9|
|     EUR|  2.0|      2.0|
+--------+-----+---------+

Here, I've translated the code, and it appears to be the desired output you provided.

英文:

See the below implementation -

Input DF-

from pysprl.sql.types import *
schema = StructType([
    StructField(&quot;currency&quot;, StringType(), True),
    StructField(&quot;value&quot;, DoubleType(), True)
])
data = [(&quot;USD&quot;, 1.00), (&quot;EUR&quot;, 2.00)]
raw_df = spark.createDataFrame(data, schema)

Required Output-

from pyspark.sql.functions import *
currency_exchanges = {&#39;EUR&#39;: 1.00, &#39;USD&#39;: 0.90}
currency_df = spark.createDataFrame(list(currency_exchanges.items()), [&#39;currency&#39;, &#39;exchange_rate&#39;])
result_df = raw_df.join(broadcast_currency_df, &#39;currency&#39;, &#39;left&#39;) \
    .withColumn(&#39;value_eur&#39;, when(col(&#39;currency&#39;) == &#39;EUR&#39;, col(&#39;value&#39;)).otherwise(col(&#39;value&#39;) * col(&#39;exchange_rate&#39;))) \
    .drop(&#39;exchange_rate&#39;)
result_df.show()
+--------+-----+---------+
|currency|value|value_eur|
+--------+-----+---------+
|     USD|  1.0|      0.9|
|     EUR|  2.0|      2.0|
+--------+-----+---------+

Here I've created another currency_df using the dictionary and have joined raw_df with currency_df for the required use case.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

I’ll provide the translation as requested: 创建新的Spark列基于字典的值

问题

Pyspark UDF 评估

阅读 PySpark

如何扩展查询，如果 SQL 查询是带参数的？

创建基于现有列数据的新列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。