从多行获取数值到单行

huangapple go评论65阅读模式
英文:

Getting values from multiple rows into a single row

问题

我想要根据另一列的条件,将单列的多行值获取到单行的不同列中。

我想要根据field_name列中的值获取同一行但不同列中的field_value_name的值,因为它们属于相同的id。

如何通过SQL或Pyspark实现这一目标?

我尝试使用CASE WHEN,但它会扫描每一行并返回每一行的输出。

我更希望这些值对于每个id都在单行中。

英文:

I want to get the values of multiple rows of a single column to different column in a single row based on the condition of another column.

从多行获取数值到单行

I want to get the values of the field_value_name in same row but different columns based on values present in the field_name column as they belong to the same id.

How to get this through sql or pyspark?

I tried using CASE WHEN but it will scan for every row and return the output for every row. [从多行获取数值到单行

I'd rather want those values to be in a single row for every id.

答案1

得分: 1

你需要在这里使用数据透视逻辑,例如:

<!-- language: sql -->
SELECT
    parent_id,
    MAX(CASE WHEN properties_field_name = 'Status'
             THEN properties_field_value_name END) AS REQ_STATUS,
    MAX(CASE WHEN properties_field_name = 'Type'
             THEN properties_field_value_name END) AS REQ_TYPE,
    MAX(CASE WHEN properties_field_name = 'Description'
             THEN properties_field_value_name END) AS REQ_DESC
FROM yourTable
GROUP BY parent_id
ORDER BY parent_id;
英文:

You need pivoting logic here, e.g.

<!-- language: sql -->

SELECT
    parent_id,
    MAX(CASE WHEN properties_field_name = &#39;Status&#39;
             THEN properties_field_value_name END) AS REQ_STATUS,
    MAX(CASE WHEN properties_field_name = &#39;Type&#39;
             THEN properties_field_value_name END) AS REQ_TYPE,
    MAX(CASE WHEN properties_field_name = &#39;Description&#39;
             THEN properties_field_value_name END) AS REQ_DESC
FROM yourTable
GROUP BY parent_id
ORDER BY parent_id;

答案2

得分: 0

你可以像Tim演示的那样使用MAX来完成,也可以像这样使用连接来完成:

SELECT
  parent_id,
  status.properties_field_value_name as status,
  type.properties_field_value_name as type,
  desc.properties_field_value_name as desc
FROM (
  SELECT distinct partent_id 
  FROM 你没有提到的表名
) as base
LEFT JOIN 你没有提到的表名 as status on base.parent_id = status.parent_id and status.properties_field_name = 'Status'
LEFT JOIN 你没有提到的表名 as type on base.parent_id = type.parent_id and type.properties_field_name = 'Type'
LEFT JOIN 你没有提到的表名 as desc on base.parent_id = desc.parent_id and desc.properties_field_name = 'Description'
英文:

You can do it with MAX like Tim shows or you can do it with joins like this:

SELECT
  parent_id,
  status.properties_field_value_name as status,
  type.properties_field_value_name as type,
  desc.properties_field_value_name as desc
FROM (
  SELECT distinct partent_id 
  FROM thetableyoudidnotname
) as base
LEFT JOIN thetableyoudidnotname as status on base.parent_id = status.parent_id and status.properties_field_name = &#39;Status&#39;
LEFT JOIN thetableyoudidnotname as type on base.parent_id = type.parent_id and type.properties_field_name = &#39;Type&#39;
LEFT JOIN thetableyoudidnotname as desc on base.parent_id = desc.parent_id and desc.properties_field_name = &#39;Description&#39;

答案3

得分: 0

这是您提供的代码的翻译:

# 创建数据
data = [
    [7024549, 'Status', 'Approved'],
    [7024549, 'Type', 'Jama Design'],
    [7024549, 'Description', 'null']
]

# 创建Spark DataFrame
df = spark.createDataFrame(data, ['parent_id', 'properties_field_name', 'properties_field_value_name'])

# 添加新列'id',并将'properties_field_name'列值转换为以'REQ_'开头的大写格式
df.withColumn('id', f.expr('uuid()')) \
  .withColumn('properties_field_name', f.concat(f.lit('REQ_'), f.upper(f.col('properties_field_name')))) \
  .groupBy('id', 'parent_id') \
  .pivot('properties_field_name') \
  .agg(f.first('properties_field_value_name')) \
  .drop('id') \
  .show()

这是代码的翻译版本,没有其他内容。

英文:

Simple pivot problem I think.

data = [
    [7024549, &#39;Status&#39;, &#39;Approved&#39;],
    [7024549, &#39;Type&#39;, &#39;Jama Design&#39;],
    [7024549, &#39;Description&#39;, &#39;null&#39;]
]

df = spark.createDataFrame(data, [&#39;parent_id&#39;, &#39;properties_field_name&#39;, &#39;properties_field_value_name&#39;])

df.withColumn(&#39;id&#39;, f.expr(&#39;uuid()&#39;)) \
  .withColumn(&#39;properties_field_name&#39;, f.concat(f.lit(&#39;REQ_&#39;), f.upper(f.col(&#39;properties_field_name&#39;)))) \
  .groupBy(&#39;id&#39;, &#39;parent_id&#39;) \
  .pivot(&#39;properties_field_name&#39;) \
  .agg(f.first(&#39;properties_field_value_name&#39;)) \
  .drop(&#39;id&#39;) \
  .show()

+---------+---------------+----------+-----------+
|parent_id|REQ_DESCRIPTION|REQ_STATUS|   REQ_TYPE|
+---------+---------------+----------+-----------+
|  7024549|           null|  Approved|       null|
|  7024549|           null|      null|Jama Design|
|  7024549|           null|      null|       null|
+---------+---------------+----------+-----------+

huangapple
  • 本文由 发表于 2023年2月19日 12:42:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75498046.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定