从多行获取数值到单行

huangapple go评论90阅读模式
英文:

Getting values from multiple rows into a single row

问题

我想要根据另一列的条件,将单列的多行值获取到单行的不同列中。

我想要根据field_name列中的值获取同一行但不同列中的field_value_name的值,因为它们属于相同的id。

如何通过SQL或Pyspark实现这一目标?

我尝试使用CASE WHEN,但它会扫描每一行并返回每一行的输出。

我更希望这些值对于每个id都在单行中。

英文:

I want to get the values of multiple rows of a single column to different column in a single row based on the condition of another column.

从多行获取数值到单行

I want to get the values of the field_value_name in same row but different columns based on values present in the field_name column as they belong to the same id.

How to get this through sql or pyspark?

I tried using CASE WHEN but it will scan for every row and return the output for every row. [从多行获取数值到单行

I'd rather want those values to be in a single row for every id.

答案1

得分: 1

你需要在这里使用数据透视逻辑,例如:

  1. <!-- language: sql -->
  2. SELECT
  3. parent_id,
  4. MAX(CASE WHEN properties_field_name = 'Status'
  5. THEN properties_field_value_name END) AS REQ_STATUS,
  6. MAX(CASE WHEN properties_field_name = 'Type'
  7. THEN properties_field_value_name END) AS REQ_TYPE,
  8. MAX(CASE WHEN properties_field_name = 'Description'
  9. THEN properties_field_value_name END) AS REQ_DESC
  10. FROM yourTable
  11. GROUP BY parent_id
  12. ORDER BY parent_id;
英文:

You need pivoting logic here, e.g.

<!-- language: sql -->

  1. SELECT
  2. parent_id,
  3. MAX(CASE WHEN properties_field_name = &#39;Status&#39;
  4. THEN properties_field_value_name END) AS REQ_STATUS,
  5. MAX(CASE WHEN properties_field_name = &#39;Type&#39;
  6. THEN properties_field_value_name END) AS REQ_TYPE,
  7. MAX(CASE WHEN properties_field_name = &#39;Description&#39;
  8. THEN properties_field_value_name END) AS REQ_DESC
  9. FROM yourTable
  10. GROUP BY parent_id
  11. ORDER BY parent_id;

答案2

得分: 0

你可以像Tim演示的那样使用MAX来完成,也可以像这样使用连接来完成:

  1. SELECT
  2. parent_id,
  3. status.properties_field_value_name as status,
  4. type.properties_field_value_name as type,
  5. desc.properties_field_value_name as desc
  6. FROM (
  7. SELECT distinct partent_id
  8. FROM 你没有提到的表名
  9. ) as base
  10. LEFT JOIN 你没有提到的表名 as status on base.parent_id = status.parent_id and status.properties_field_name = 'Status'
  11. LEFT JOIN 你没有提到的表名 as type on base.parent_id = type.parent_id and type.properties_field_name = 'Type'
  12. LEFT JOIN 你没有提到的表名 as desc on base.parent_id = desc.parent_id and desc.properties_field_name = 'Description'
英文:

You can do it with MAX like Tim shows or you can do it with joins like this:

  1. SELECT
  2. parent_id,
  3. status.properties_field_value_name as status,
  4. type.properties_field_value_name as type,
  5. desc.properties_field_value_name as desc
  6. FROM (
  7. SELECT distinct partent_id
  8. FROM thetableyoudidnotname
  9. ) as base
  10. LEFT JOIN thetableyoudidnotname as status on base.parent_id = status.parent_id and status.properties_field_name = &#39;Status&#39;
  11. LEFT JOIN thetableyoudidnotname as type on base.parent_id = type.parent_id and type.properties_field_name = &#39;Type&#39;
  12. LEFT JOIN thetableyoudidnotname as desc on base.parent_id = desc.parent_id and desc.properties_field_name = &#39;Description&#39;

答案3

得分: 0

这是您提供的代码的翻译:

  1. # 创建数据
  2. data = [
  3. [7024549, 'Status', 'Approved'],
  4. [7024549, 'Type', 'Jama Design'],
  5. [7024549, 'Description', 'null']
  6. ]
  7. # 创建Spark DataFrame
  8. df = spark.createDataFrame(data, ['parent_id', 'properties_field_name', 'properties_field_value_name'])
  9. # 添加新列'id',并将'properties_field_name'列值转换为以'REQ_'开头的大写格式
  10. df.withColumn('id', f.expr('uuid()')) \
  11. .withColumn('properties_field_name', f.concat(f.lit('REQ_'), f.upper(f.col('properties_field_name')))) \
  12. .groupBy('id', 'parent_id') \
  13. .pivot('properties_field_name') \
  14. .agg(f.first('properties_field_value_name')) \
  15. .drop('id') \
  16. .show()

这是代码的翻译版本,没有其他内容。

英文:

Simple pivot problem I think.

  1. data = [
  2. [7024549, &#39;Status&#39;, &#39;Approved&#39;],
  3. [7024549, &#39;Type&#39;, &#39;Jama Design&#39;],
  4. [7024549, &#39;Description&#39;, &#39;null&#39;]
  5. ]
  6. df = spark.createDataFrame(data, [&#39;parent_id&#39;, &#39;properties_field_name&#39;, &#39;properties_field_value_name&#39;])
  7. df.withColumn(&#39;id&#39;, f.expr(&#39;uuid()&#39;)) \
  8. .withColumn(&#39;properties_field_name&#39;, f.concat(f.lit(&#39;REQ_&#39;), f.upper(f.col(&#39;properties_field_name&#39;)))) \
  9. .groupBy(&#39;id&#39;, &#39;parent_id&#39;) \
  10. .pivot(&#39;properties_field_name&#39;) \
  11. .agg(f.first(&#39;properties_field_value_name&#39;)) \
  12. .drop(&#39;id&#39;) \
  13. .show()
  14. +---------+---------------+----------+-----------+
  15. |parent_id|REQ_DESCRIPTION|REQ_STATUS| REQ_TYPE|
  16. +---------+---------------+----------+-----------+
  17. | 7024549| null| Approved| null|
  18. | 7024549| null| null|Jama Design|
  19. | 7024549| null| null| null|
  20. +---------+---------------+----------+-----------+

huangapple
  • 本文由 发表于 2023年2月19日 12:42:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75498046.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定