I’ll provide the translation as requested: 创建新的Spark列基于字典的值

huangapple go评论56阅读模式
英文:

create new spark column based on dictionary values

问题

I understand your request. Here's the translated code part without additional content:

如果我有一个以此格式为Spark数据框的DF

| currency | value |
| -------- | -------- |
| USD   | 1.00   |
| EUR   | 2.00   |

以及一个包含当前货币汇率的字典例如 - `{EUR: 1.00, USD: 0.90}`)

我想要根据该字典添加另一个列例如 `value_eur`。我应该如何操作

我尝试了以下函数

```python
raw_df.withColumn("value_eur", raw_df.value * currency_exchanges[raw_df.currency])

但它报错了:

TypeError: unhashable type: 'Column'

<details>
<summary>英文:</summary>

Say I have a Spark DF in this format:

| currency | value |
| -------- | -------- |
| USD   | 1.00   |
| EUR   | 2.00   |

And a dictionary with current currency exchanges &lt;br&gt;
(e.g - `{EUR: 1.00, USD: 0.90}`).

And I want to add another column, say `value_eur`, based on that dict. How would I go on in doing that?

I tried the function 

raw_df.withColumn("value_eur", raw_df.value * currency_exchanges[raw_df.currency])

but it gives me Error 

TypeError: unhashable type: 'Column'


</details>


# 答案1
**得分**: 0

I understand your request. Here is the translated code:

**Input DF**-
```python
from pysprl.sql.types import *

schema = StructType([
    StructField("currency", StringType(), True),
    StructField("value", DoubleType(), True)
])

data = [("USD", 1.00), ("EUR", 2.00)]
raw_df = spark.createDataFrame(data, schema)

Required Output-

from pyspark.sql.functions import *

currency_exchanges = {'EUR': 1.00, 'USD': 0.90}
currency_df = spark.createDataFrame(list(currency_exchanges.items()), ['currency', 'exchange_rate'])

result_df = raw_df.join(broadcast(currency_df), 'currency', 'left') \
    .withColumn('value_eur', when(col('currency') == 'EUR', col('value')).otherwise(col('value') * col('exchange_rate'))) \
    .drop('exchange_rate')

result_df.show()

+--------+-----+---------+
|currency|value|value_eur|
+--------+-----+---------+
|     USD|  1.0|      0.9|
|     EUR|  2.0|      2.0|
+--------+-----+---------+

Here, I've translated the code, and it appears to be the desired output you provided.

英文:

See the below implementation -

Input DF-

from pysprl.sql.types import *

schema = StructType([
    StructField(&quot;currency&quot;, StringType(), True),
    StructField(&quot;value&quot;, DoubleType(), True)
])

data = [(&quot;USD&quot;, 1.00), (&quot;EUR&quot;, 2.00)]
raw_df = spark.createDataFrame(data, schema)

Required Output-

from pyspark.sql.functions import *

currency_exchanges = {&#39;EUR&#39;: 1.00, &#39;USD&#39;: 0.90}
currency_df = spark.createDataFrame(list(currency_exchanges.items()), [&#39;currency&#39;, &#39;exchange_rate&#39;])


result_df = raw_df.join(broadcast_currency_df, &#39;currency&#39;, &#39;left&#39;) \
    .withColumn(&#39;value_eur&#39;, when(col(&#39;currency&#39;) == &#39;EUR&#39;, col(&#39;value&#39;)).otherwise(col(&#39;value&#39;) * col(&#39;exchange_rate&#39;))) \
    .drop(&#39;exchange_rate&#39;)

result_df.show()

+--------+-----+---------+
|currency|value|value_eur|
+--------+-----+---------+
|     USD|  1.0|      0.9|
|     EUR|  2.0|      2.0|
+--------+-----+---------+

Here I've created another currency_df using the dictionary and have joined raw_df with currency_df for the required use case.

huangapple
  • 本文由 发表于 2023年4月6日 22:03:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/75950424.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定