英文:
Getting values from multiple rows into a single row
问题
我想要根据另一列的条件,将单列的多行值获取到单行的不同列中。
我想要根据field_name列中的值获取同一行但不同列中的field_value_name的值,因为它们属于相同的id。
如何通过SQL或Pyspark实现这一目标?
我尝试使用CASE WHEN,但它会扫描每一行并返回每一行的输出。
我更希望这些值对于每个id都在单行中。
英文:
I want to get the values of multiple rows of a single column to different column in a single row based on the condition of another column.
I want to get the values of the field_value_name in same row but different columns based on values present in the field_name column as they belong to the same id.
How to get this through sql or pyspark?
I tried using CASE WHEN but it will scan for every row and return the output for every row. [
I'd rather want those values to be in a single row for every id.
答案1
得分: 1
你需要在这里使用数据透视逻辑,例如:
<!-- language: sql -->
SELECT
parent_id,
MAX(CASE WHEN properties_field_name = 'Status'
THEN properties_field_value_name END) AS REQ_STATUS,
MAX(CASE WHEN properties_field_name = 'Type'
THEN properties_field_value_name END) AS REQ_TYPE,
MAX(CASE WHEN properties_field_name = 'Description'
THEN properties_field_value_name END) AS REQ_DESC
FROM yourTable
GROUP BY parent_id
ORDER BY parent_id;
英文:
You need pivoting logic here, e.g.
<!-- language: sql -->
SELECT
parent_id,
MAX(CASE WHEN properties_field_name = 'Status'
THEN properties_field_value_name END) AS REQ_STATUS,
MAX(CASE WHEN properties_field_name = 'Type'
THEN properties_field_value_name END) AS REQ_TYPE,
MAX(CASE WHEN properties_field_name = 'Description'
THEN properties_field_value_name END) AS REQ_DESC
FROM yourTable
GROUP BY parent_id
ORDER BY parent_id;
答案2
得分: 0
你可以像Tim演示的那样使用MAX来完成,也可以像这样使用连接来完成:
SELECT
parent_id,
status.properties_field_value_name as status,
type.properties_field_value_name as type,
desc.properties_field_value_name as desc
FROM (
SELECT distinct partent_id
FROM 你没有提到的表名
) as base
LEFT JOIN 你没有提到的表名 as status on base.parent_id = status.parent_id and status.properties_field_name = 'Status'
LEFT JOIN 你没有提到的表名 as type on base.parent_id = type.parent_id and type.properties_field_name = 'Type'
LEFT JOIN 你没有提到的表名 as desc on base.parent_id = desc.parent_id and desc.properties_field_name = 'Description'
英文:
You can do it with MAX like Tim shows or you can do it with joins like this:
SELECT
parent_id,
status.properties_field_value_name as status,
type.properties_field_value_name as type,
desc.properties_field_value_name as desc
FROM (
SELECT distinct partent_id
FROM thetableyoudidnotname
) as base
LEFT JOIN thetableyoudidnotname as status on base.parent_id = status.parent_id and status.properties_field_name = 'Status'
LEFT JOIN thetableyoudidnotname as type on base.parent_id = type.parent_id and type.properties_field_name = 'Type'
LEFT JOIN thetableyoudidnotname as desc on base.parent_id = desc.parent_id and desc.properties_field_name = 'Description'
答案3
得分: 0
这是您提供的代码的翻译:
# 创建数据
data = [
[7024549, 'Status', 'Approved'],
[7024549, 'Type', 'Jama Design'],
[7024549, 'Description', 'null']
]
# 创建Spark DataFrame
df = spark.createDataFrame(data, ['parent_id', 'properties_field_name', 'properties_field_value_name'])
# 添加新列'id',并将'properties_field_name'列值转换为以'REQ_'开头的大写格式
df.withColumn('id', f.expr('uuid()')) \
.withColumn('properties_field_name', f.concat(f.lit('REQ_'), f.upper(f.col('properties_field_name')))) \
.groupBy('id', 'parent_id') \
.pivot('properties_field_name') \
.agg(f.first('properties_field_value_name')) \
.drop('id') \
.show()
这是代码的翻译版本,没有其他内容。
英文:
Simple pivot problem I think.
data = [
[7024549, 'Status', 'Approved'],
[7024549, 'Type', 'Jama Design'],
[7024549, 'Description', 'null']
]
df = spark.createDataFrame(data, ['parent_id', 'properties_field_name', 'properties_field_value_name'])
df.withColumn('id', f.expr('uuid()')) \
.withColumn('properties_field_name', f.concat(f.lit('REQ_'), f.upper(f.col('properties_field_name')))) \
.groupBy('id', 'parent_id') \
.pivot('properties_field_name') \
.agg(f.first('properties_field_value_name')) \
.drop('id') \
.show()
+---------+---------------+----------+-----------+
|parent_id|REQ_DESCRIPTION|REQ_STATUS| REQ_TYPE|
+---------+---------------+----------+-----------+
| 7024549| null| Approved| null|
| 7024549| null| null|Jama Design|
| 7024549| null| null| null|
+---------+---------------+----------+-----------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论