2023年2月19日 12:42:17go评论145阅读模式

英文:

Getting values from multiple rows into a single row

问题

我想要根据另一列的条件，将单列的多行值获取到单行的不同列中。

我想要根据field_name列中的值获取同一行但不同列中的field_value_name的值，因为它们属于相同的id。

如何通过SQL或Pyspark实现这一目标？

我尝试使用CASE WHEN，但它会扫描每一行并返回每一行的输出。

我更希望这些值对于每个id都在单行中。

英文:

I want to get the values of multiple rows of a single column to different column in a single row based on the condition of another column.

I want to get the values of the field_value_name in same row but different columns based on values present in the field_name column as they belong to the same id.

How to get this through sql or pyspark?

I tried using CASE WHEN but it will scan for every row and return the output for every row. [ 从多行获取数值到单行

I'd rather want those values to be in a single row for every id.

答案1

得分: 1

你需要在这里使用数据透视逻辑，例如：

<!-- language: sql -->
SELECT
    parent_id,
    MAX(CASE WHEN properties_field_name = 'Status'
             THEN properties_field_value_name END) AS REQ_STATUS,
    MAX(CASE WHEN properties_field_name = 'Type'
             THEN properties_field_value_name END) AS REQ_TYPE,
    MAX(CASE WHEN properties_field_name = 'Description'
             THEN properties_field_value_name END) AS REQ_DESC
FROM yourTable
GROUP BY parent_id
ORDER BY parent_id;

英文:

You need pivoting logic here, e.g.

SELECT
    parent_id,
    MAX(CASE WHEN properties_field_name = &#39;Status&#39;
             THEN properties_field_value_name END) AS REQ_STATUS,
    MAX(CASE WHEN properties_field_name = &#39;Type&#39;
             THEN properties_field_value_name END) AS REQ_TYPE,
    MAX(CASE WHEN properties_field_name = &#39;Description&#39;
             THEN properties_field_value_name END) AS REQ_DESC
FROM yourTable
GROUP BY parent_id
ORDER BY parent_id;

答案2

得分: 0

你可以像Tim演示的那样使用MAX来完成，也可以像这样使用连接来完成：

SELECT
  parent_id,
  status.properties_field_value_name as status,
  type.properties_field_value_name as type,
  desc.properties_field_value_name as desc
FROM (
  SELECT distinct partent_id 
  FROM 你没有提到的表名
) as base
LEFT JOIN 你没有提到的表名 as status on base.parent_id = status.parent_id and status.properties_field_name = 'Status'
LEFT JOIN 你没有提到的表名 as type on base.parent_id = type.parent_id and type.properties_field_name = 'Type'
LEFT JOIN 你没有提到的表名 as desc on base.parent_id = desc.parent_id and desc.properties_field_name = 'Description'

英文:

You can do it with MAX like Tim shows or you can do it with joins like this:

SELECT
  parent_id,
  status.properties_field_value_name as status,
  type.properties_field_value_name as type,
  desc.properties_field_value_name as desc
FROM (
  SELECT distinct partent_id 
  FROM thetableyoudidnotname
) as base
LEFT JOIN thetableyoudidnotname as status on base.parent_id = status.parent_id and status.properties_field_name = &#39;Status&#39;
LEFT JOIN thetableyoudidnotname as type on base.parent_id = type.parent_id and type.properties_field_name = &#39;Type&#39;
LEFT JOIN thetableyoudidnotname as desc on base.parent_id = desc.parent_id and desc.properties_field_name = &#39;Description&#39;

答案3

得分: 0

这是您提供的代码的翻译：

# 创建数据
data = [
    [7024549, 'Status', 'Approved'],
    [7024549, 'Type', 'Jama Design'],
    [7024549, 'Description', 'null']
]

# 创建Spark DataFrame
df = spark.createDataFrame(data, ['parent_id', 'properties_field_name', 'properties_field_value_name'])

# 添加新列'id'，并将'properties_field_name'列值转换为以'REQ_'开头的大写格式
df.withColumn('id', f.expr('uuid()')) \
  .withColumn('properties_field_name', f.concat(f.lit('REQ_'), f.upper(f.col('properties_field_name')))) \
  .groupBy('id', 'parent_id') \
  .pivot('properties_field_name') \
  .agg(f.first('properties_field_value_name')) \
  .drop('id') \
  .show()

这是代码的翻译版本，没有其他内容。

英文:

Simple pivot problem I think.

data = [
    [7024549, &#39;Status&#39;, &#39;Approved&#39;],
    [7024549, &#39;Type&#39;, &#39;Jama Design&#39;],
    [7024549, &#39;Description&#39;, &#39;null&#39;]
]

df = spark.createDataFrame(data, [&#39;parent_id&#39;, &#39;properties_field_name&#39;, &#39;properties_field_value_name&#39;])

df.withColumn(&#39;id&#39;, f.expr(&#39;uuid()&#39;)) \
  .withColumn(&#39;properties_field_name&#39;, f.concat(f.lit(&#39;REQ_&#39;), f.upper(f.col(&#39;properties_field_name&#39;)))) \
  .groupBy(&#39;id&#39;, &#39;parent_id&#39;) \
  .pivot(&#39;properties_field_name&#39;) \
  .agg(f.first(&#39;properties_field_value_name&#39;)) \
  .drop(&#39;id&#39;) \
  .show()

+---------+---------------+----------+-----------+
|parent_id|REQ_DESCRIPTION|REQ_STATUS|   REQ_TYPE|
+---------+---------------+----------+-----------+
|  7024549|           null|  Approved|       null|
|  7024549|           null|      null|Jama Design|
|  7024549|           null|      null|       null|
+---------+---------------+----------+-----------+

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从多行获取数值到单行

问题

答案1

答案2

答案3

创建/删除唯一索引的MySQL语法，使用”name”与”column”的区别。

SQL：选择不同的记录，同时忽略一个列。

SQL查询：在行被反转时对ID进行求和。

创建子查询以按日期计算当前行后面有多少行。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论