基于行顺序生成记录的分组列。

huangapple go评论59阅读模式
英文:

Generating grouping column for records based on row order

问题

category need responsibility
Grounds Keeping Tree/shrub maintenance Parks and Recreation
Grounds Keeping Tree Removal Public Works
Grounds Keeping Mowing Parks and Recreation
Grounds Keeping Fire Station Mowing Fire
Grounds Keeping Irrigation Systems Parks and Recreation
Grounds Keeping Retaining Walls/Slope Maint. Capital Projects
Building Maintenance Cleaning/Resupply Building Occupant
Building Maintenance Electrical Facility Maintenance
Building Maintenance Carpet Maintenance Facility Operations
Building Maintenance Generators Facility Maintenance
Building Maintenance Appliances Facility Maintenance
英文:

I have Postgres data imported from a spreadsheet, where section headers have been imported along with the data. Is there a way to move the section headers to their own column? I have looked at window functions and I have tried a self join.

I am certain that the category names will always have a null "responsibility". It would have to rely on the order that they were inserted, there is no sort column. Is that reasonable?

I need to transform this:

need responsibility
Grounds Keeping
Tree/shrub maintenance Parks and Recreation
Tree Removal Public Works
Mowing Parks and Recreation
Fire Station Mowing Fire
Irrigation Systems Parks and Recreation
Retaining Walls/Slope Maint. Capital Projects
Building Maintenance
Cleaning/Resupply Building Occupant
Electrical Facility Maintenance
Carpet Maintenance Facility Operations
Generators Facility Maintenance
Appliances Facility Maintenance

into:

category need responsibility
Grounds Keeping Tree/shrub maintenance Parks and Recreation
Grounds Keeping Tree Removal Public Works
Grounds Keeping Mowing Parks and Recreation
Grounds Keeping Fire Station Mowing Fire
Grounds Keeping Irrigation Systems Parks and Recreation
Grounds Keeping Retaining Walls/Slope Maint. Capital Projects
Building Maintenance Cleaning/Resupply Building Occupant
Building Maintenance Electrical Facility Maintenance
Building Maintenance Carpet Maintenance Facility Operations
Building Maintenance Generators Facility Maintenance
Building Maintenance Appliances Facility Maintenance

答案1

得分: 3

正如 @eshirvana 已经指出的,没有可靠的方法来执行这个任务。尽管如果你仍然没有可能拥有一个唯一标识行顺序的增量值,你可以利用 "ctid" 系统值,它保存了在表中插入记录的顺序(如果你批量插入,仍然不可靠使用这种方法)。

解决这个问题的方法如下:

  • 使用在 responsibility = NULL 上的运行总和重新生成你的分区,按 "ctid" 排序
  • 将你的类别字段生成为 "need" 值的最大值,其中责任为 null
  • 通过选择所有 "responsibility" 不为空的记录来移除标题记录
WITH cte AS (
    SELECT tab.*,
           SUM(responsibility IS NULL::INT) OVER(ORDER BY ctid) AS parts
    FROM tab
), cte2 AS (
    SELECT MAX(need) FILTER(WHERE responsibility IS NULL) OVER(PARTITION BY parts) AS category,
           cte.*
    FROM cte
)
SELECT * FROM cte2 WHERE responsibility IS NOT NULL

在这里查看演示 1

英文:

As @eshirvana already points out, there's no reliable way to do this task. Although if you still have no possibility of having an incremental value that uniquely identifies the order of your rows, you can exploit the "ctid" system value, that holds the order of insertion of your records inside that table (if you inserted in bulk, it's still unreliable to use this approach).

So goes how you solve the problem:

  • regenerating your partition using a running sum on responsibility = NULL, ordering by "ctid"
  • generating your category field as a max "need" value where responsibility is null
  • removing the header records, by selecting all records where "responsibility" is not null
WITH cte AS (
    SELECT tab.*,
           SUM(responsibility IS NULL::INT) OVER(ORDER BY ctid) AS parts
    FROM tab
), cte2 AS (
    SELECT MAX(need) FILTER(WHERE responsibility IS NULL) OVER(PARTITION BY parts) AS category,
           cte.*
    FROM cte
)
SELECT * FROM cte2 WHERE responsibility IS NOT NULL

Check the demo here.

答案2

得分: 1

以下是翻译好的部分:

"put it here since it won't fit in comments section :

as it's been mentioned here:
> If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on.

And to add a new column for insert date , you have to alter the table and add the column and fill it with proper values whether you insert or manually when you want to update them:

alter table [table_name] add column InsertDate timestamp

see documentation above for more information.

Keep in mind this is only one approach, not necessary the best approach."

英文:

put it here since it won't fit in comments section :

as it's been mentioned here:
> If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on.

And to add a new column for insert date , you have to alter the table and add the column and fill it with proper values whether you insert or manually when you want to update them:

alter table [table_name] add column InsertDate timestamp

see documentation above for more information.

Keep in mind this is only one approach, not necessary the best approach.

答案3

得分: 1

似乎这是一个最后非空谜题的示例。策略是创建一个 row_number() 列来分组你想要向下填充的值,从而为你的 category 创建一个衍生的最后非空列。

SELECT *
FROM (
	WITH cte AS (
		SELECT *
			,MAX(CASE 
					WHEN category IS NOT NULL
						THEN rn
					END) OVER (
				ORDER BY rn ROWS UNBOUNDED PRECEDING
				) AS grp
		FROM (
			SELECT row_number() OVER (
					ORDER BY NULL
					) AS rn
				,responsibility
				,CASE 
					WHEN responsibility IS NULL
						THEN NULL
					ELSE need
					END AS _need
				,CASE 
					WHEN responsibility IS NULL
						THEN need
					END AS category
			FROM test
			) AS t2
		)
	SELECT MAX(category) OVER (
			PARTITION BY grp ORDER BY rn ROWS UNBOUNDED PRECEDING
			) AS _category
		,_need
		,responsibility
	FROM cte
	) AS t3
WHERE responsibility IS NOT NULL;

这里是演示

英文:

seems like this was an example of the last non-null puzzle. the strategy is to derive a row_number() column to group the values you want to 'fill' down so that you create a derived last-non-null column for your category

SELECT *
FROM (
	WITH cte AS (
			SELECT *
				,MAX(CASE 
						WHEN category IS NOT NULL
							THEN rn
						END) OVER (
					ORDER BY rn ROWS UNBOUNDED PRECEDING
					) AS grp
			FROM (
				SELECT row_number() OVER (
						ORDER BY NULL
						) AS rn
					,responsibility
					,CASE 
						WHEN responsibility IS NULL
							THEN NULL
						ELSE need
						END AS _need
					,CASE 
						WHEN responsibility IS NULL
							THEN need
						END AS category
				FROM test
				) AS t2
			)
	SELECT MAX(category) OVER (
			PARTITION BY grp ORDER BY rn ROWS UNBOUNDED PRECEDING
			) AS _category
		,_need
		,responsibility
	FROM cte
	) AS t3
WHERE responsibility IS NOT NULL;

the demo is here

huangapple
  • 本文由 发表于 2023年6月6日 02:26:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76409096.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定