英文:
Generating grouping column for records based on row order
问题
category | need | responsibility |
---|---|---|
Grounds Keeping | Tree/shrub maintenance | Parks and Recreation |
Grounds Keeping | Tree Removal | Public Works |
Grounds Keeping | Mowing | Parks and Recreation |
Grounds Keeping | Fire Station Mowing | Fire |
Grounds Keeping | Irrigation Systems | Parks and Recreation |
Grounds Keeping | Retaining Walls/Slope Maint. | Capital Projects |
Building Maintenance | Cleaning/Resupply | Building Occupant |
Building Maintenance | Electrical | Facility Maintenance |
Building Maintenance | Carpet Maintenance | Facility Operations |
Building Maintenance | Generators | Facility Maintenance |
Building Maintenance | Appliances | Facility Maintenance |
英文:
I have Postgres data imported from a spreadsheet, where section headers have been imported along with the data. Is there a way to move the section headers to their own column? I have looked at window functions and I have tried a self join.
I am certain that the category names will always have a null "responsibility". It would have to rely on the order that they were inserted, there is no sort column. Is that reasonable?
I need to transform this:
need | responsibility |
---|---|
Grounds Keeping | |
Tree/shrub maintenance | Parks and Recreation |
Tree Removal | Public Works |
Mowing | Parks and Recreation |
Fire Station Mowing | Fire |
Irrigation Systems | Parks and Recreation |
Retaining Walls/Slope Maint. | Capital Projects |
Building Maintenance | |
Cleaning/Resupply | Building Occupant |
Electrical | Facility Maintenance |
Carpet Maintenance | Facility Operations |
Generators | Facility Maintenance |
Appliances | Facility Maintenance |
into:
category | need | responsibility |
---|---|---|
Grounds Keeping | Tree/shrub maintenance | Parks and Recreation |
Grounds Keeping | Tree Removal | Public Works |
Grounds Keeping | Mowing | Parks and Recreation |
Grounds Keeping | Fire Station Mowing | Fire |
Grounds Keeping | Irrigation Systems | Parks and Recreation |
Grounds Keeping | Retaining Walls/Slope Maint. | Capital Projects |
Building Maintenance | Cleaning/Resupply | Building Occupant |
Building Maintenance | Electrical | Facility Maintenance |
Building Maintenance | Carpet Maintenance | Facility Operations |
Building Maintenance | Generators | Facility Maintenance |
Building Maintenance | Appliances | Facility Maintenance |
答案1
得分: 3
正如 @eshirvana 已经指出的,没有可靠的方法来执行这个任务。尽管如果你仍然没有可能拥有一个唯一标识行顺序的增量值,你可以利用 "ctid" 系统值,它保存了在表中插入记录的顺序(如果你批量插入,仍然不可靠使用这种方法)。
解决这个问题的方法如下:
- 使用在
responsibility = NULL
上的运行总和重新生成你的分区,按 "ctid" 排序 - 将你的类别字段生成为 "need" 值的最大值,其中责任为 null
- 通过选择所有 "responsibility" 不为空的记录来移除标题记录
WITH cte AS (
SELECT tab.*,
SUM(responsibility IS NULL::INT) OVER(ORDER BY ctid) AS parts
FROM tab
), cte2 AS (
SELECT MAX(need) FILTER(WHERE responsibility IS NULL) OVER(PARTITION BY parts) AS category,
cte.*
FROM cte
)
SELECT * FROM cte2 WHERE responsibility IS NOT NULL
在这里查看演示 1。
英文:
As @eshirvana already points out, there's no reliable way to do this task. Although if you still have no possibility of having an incremental value that uniquely identifies the order of your rows, you can exploit the "ctid" system value, that holds the order of insertion of your records inside that table (if you inserted in bulk, it's still unreliable to use this approach).
So goes how you solve the problem:
- regenerating your partition using a running sum on
responsibility = NULL
, ordering by "ctid" - generating your category field as a max "need" value where responsibility is null
- removing the header records, by selecting all records where "responsibility" is not null
WITH cte AS (
SELECT tab.*,
SUM(responsibility IS NULL::INT) OVER(ORDER BY ctid) AS parts
FROM tab
), cte2 AS (
SELECT MAX(need) FILTER(WHERE responsibility IS NULL) OVER(PARTITION BY parts) AS category,
cte.*
FROM cte
)
SELECT * FROM cte2 WHERE responsibility IS NOT NULL
Check the demo here.
答案2
得分: 1
以下是翻译好的部分:
"put it here since it won't fit in comments section :
as it's been mentioned here:
> If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on.
And to add a new column for insert date , you have to alter the table and add the column and fill it with proper values whether you insert or manually when you want to update them:
alter table [table_name] add column InsertDate timestamp
see documentation above for more information.
Keep in mind this is only one approach, not necessary the best approach."
英文:
put it here since it won't fit in comments section :
as it's been mentioned here:
> If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on.
And to add a new column for insert date , you have to alter the table and add the column and fill it with proper values whether you insert or manually when you want to update them:
alter table [table_name] add column InsertDate timestamp
see documentation above for more information.
Keep in mind this is only one approach, not necessary the best approach.
答案3
得分: 1
似乎这是一个最后非空谜题的示例。策略是创建一个 row_number()
列来分组你想要向下填充的值,从而为你的 category
创建一个衍生的最后非空列。
SELECT *
FROM (
WITH cte AS (
SELECT *
,MAX(CASE
WHEN category IS NOT NULL
THEN rn
END) OVER (
ORDER BY rn ROWS UNBOUNDED PRECEDING
) AS grp
FROM (
SELECT row_number() OVER (
ORDER BY NULL
) AS rn
,responsibility
,CASE
WHEN responsibility IS NULL
THEN NULL
ELSE need
END AS _need
,CASE
WHEN responsibility IS NULL
THEN need
END AS category
FROM test
) AS t2
)
SELECT MAX(category) OVER (
PARTITION BY grp ORDER BY rn ROWS UNBOUNDED PRECEDING
) AS _category
,_need
,responsibility
FROM cte
) AS t3
WHERE responsibility IS NOT NULL;
这里是演示。
英文:
seems like this was an example of the last non-null puzzle. the strategy is to derive a row_number()
column to group the values you want to 'fill' down so that you create a derived last-non-null column for your category
SELECT *
FROM (
WITH cte AS (
SELECT *
,MAX(CASE
WHEN category IS NOT NULL
THEN rn
END) OVER (
ORDER BY rn ROWS UNBOUNDED PRECEDING
) AS grp
FROM (
SELECT row_number() OVER (
ORDER BY NULL
) AS rn
,responsibility
,CASE
WHEN responsibility IS NULL
THEN NULL
ELSE need
END AS _need
,CASE
WHEN responsibility IS NULL
THEN need
END AS category
FROM test
) AS t2
)
SELECT MAX(category) OVER (
PARTITION BY grp ORDER BY rn ROWS UNBOUNDED PRECEDING
) AS _category
,_need
,responsibility
FROM cte
) AS t3
WHERE responsibility IS NOT NULL;
the demo is here
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论