I’ll provide the translation as requested: 创建新的Spark列基于字典的值

huangapple go评论76阅读模式
英文:

create new spark column based on dictionary values

问题

I understand your request. Here's the translated code part without additional content:

  1. 如果我有一个以此格式为Spark数据框的DF
  2. | currency | value |
  3. | -------- | -------- |
  4. | USD | 1.00 |
  5. | EUR | 2.00 |
  6. 以及一个包含当前货币汇率的字典例如 - `{EUR: 1.00, USD: 0.90}`)
  7. 我想要根据该字典添加另一个列例如 `value_eur`我应该如何操作
  8. 我尝试了以下函数
  9. ```python
  10. raw_df.withColumn("value_eur", raw_df.value * currency_exchanges[raw_df.currency])

但它报错了:

  1. TypeError: unhashable type: 'Column'
  1. <details>
  2. <summary>英文:</summary>
  3. Say I have a Spark DF in this format:
  4. | currency | value |
  5. | -------- | -------- |
  6. | USD | 1.00 |
  7. | EUR | 2.00 |
  8. And a dictionary with current currency exchanges &lt;br&gt;
  9. (e.g - `{EUR: 1.00, USD: 0.90}`).
  10. And I want to add another column, say `value_eur`, based on that dict. How would I go on in doing that?
  11. I tried the function

raw_df.withColumn("value_eur", raw_df.value * currency_exchanges[raw_df.currency])

  1. but it gives me Error

TypeError: unhashable type: 'Column'

  1. </details>
  2. # 答案1
  3. **得分**: 0
  4. I understand your request. Here is the translated code:
  5. **Input DF**-
  6. ```python
  7. from pysprl.sql.types import *
  8. schema = StructType([
  9. StructField("currency", StringType(), True),
  10. StructField("value", DoubleType(), True)
  11. ])
  12. data = [("USD", 1.00), ("EUR", 2.00)]
  13. raw_df = spark.createDataFrame(data, schema)

Required Output-

  1. from pyspark.sql.functions import *
  2. currency_exchanges = {'EUR': 1.00, 'USD': 0.90}
  3. currency_df = spark.createDataFrame(list(currency_exchanges.items()), ['currency', 'exchange_rate'])
  4. result_df = raw_df.join(broadcast(currency_df), 'currency', 'left') \
  5. .withColumn('value_eur', when(col('currency') == 'EUR', col('value')).otherwise(col('value') * col('exchange_rate'))) \
  6. .drop('exchange_rate')
  7. result_df.show()
  8. +--------+-----+---------+
  9. |currency|value|value_eur|
  10. +--------+-----+---------+
  11. | USD| 1.0| 0.9|
  12. | EUR| 2.0| 2.0|
  13. +--------+-----+---------+

Here, I've translated the code, and it appears to be the desired output you provided.

英文:

See the below implementation -

Input DF-

  1. from pysprl.sql.types import *
  2. schema = StructType([
  3. StructField(&quot;currency&quot;, StringType(), True),
  4. StructField(&quot;value&quot;, DoubleType(), True)
  5. ])
  6. data = [(&quot;USD&quot;, 1.00), (&quot;EUR&quot;, 2.00)]
  7. raw_df = spark.createDataFrame(data, schema)

Required Output-

  1. from pyspark.sql.functions import *
  2. currency_exchanges = {&#39;EUR&#39;: 1.00, &#39;USD&#39;: 0.90}
  3. currency_df = spark.createDataFrame(list(currency_exchanges.items()), [&#39;currency&#39;, &#39;exchange_rate&#39;])
  4. result_df = raw_df.join(broadcast_currency_df, &#39;currency&#39;, &#39;left&#39;) \
  5. .withColumn(&#39;value_eur&#39;, when(col(&#39;currency&#39;) == &#39;EUR&#39;, col(&#39;value&#39;)).otherwise(col(&#39;value&#39;) * col(&#39;exchange_rate&#39;))) \
  6. .drop(&#39;exchange_rate&#39;)
  7. result_df.show()
  8. +--------+-----+---------+
  9. |currency|value|value_eur|
  10. +--------+-----+---------+
  11. | USD| 1.0| 0.9|
  12. | EUR| 2.0| 2.0|
  13. +--------+-----+---------+

Here I've created another currency_df using the dictionary and have joined raw_df with currency_df for the required use case.

huangapple
  • 本文由 发表于 2023年4月6日 22:03:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/75950424.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定