英文:
Group by running total resetting every time the counter is incremented
问题
我有以下表格(我的输入只包括NIF、FILE和FILESIZE列)。
我想要根据以下规则获取PARTITION和SUBPARTITION列:
- PARTITION应该从1开始,每当SUBPARTITION达到8时递增1。
- 一个NIF不能出现在两个不同的PARTITION中(实际上这是最严格的条件)。
- 每当FILESIZE的累计总和达到10时,SUBPARTITION必须递增1。当SUBPARTITION递增1时,该累计总和必须重置。
数据:
NIF FILE FILESIZE PARTITION SUBPARTITION
----------------------------------------------
A C1 1 1 1
A C2 1 1 1
A C3 2 1 1
A C4 1 1 1
B C5 5 1 2
B C6 1 1 2
C C7 2 1 2
C C8 1 1 2
D C9 4 1 3
D C10 5 1 3
D C11 1 1 3
D C12 2 1 4
D C13 3 1 4
D C14 4 1 4
D C15 5 1 5
E C16 3 1 6
E C17 2 1 6
E C18 3 1 6
E C19 4 1 7
F C20 6 2 1
F C20 2 2 1
不需要一条单一的查询,使用PL/SQL游标也可以。
注意:这是创建输入表格的代码。
DROP TABLE my_table;
-- 创建表格
CREATE TABLE my_table (
NIF VARCHAR2(10),
FILE_ VARCHAR2(10),
FILESIZE NUMBER
);
-- 插入输入数据
INSERT INTO my_table (NIF, FILE_, FILESIZE)
SELECT 'A', 'C1', 1 FROM DUAL UNION ALL
SELECT 'A', 'C2', 1 FROM DUAL UNION ALL
SELECT 'A', 'C3', 2 FROM DUAL UNION ALL
SELECT 'A', 'C4', 1 FROM DUAL UNION ALL
SELECT 'B', 'C5', 5 FROM DUAL UNION ALL
SELECT 'B', 'C6', 1 FROM DUAL UNION ALL
SELECT 'C', 'C7', 2 FROM DUAL UNION ALL
SELECT 'C', 'C8', 1 FROM DUAL UNION ALL
SELECT 'D', 'C9', 4 FROM DUAL UNION ALL
SELECT 'D', 'C10', 5 FROM DUAL UNION ALL
SELECT 'D', 'C11', 1 FROM DUAL UNION ALL
SELECT 'D', 'C12', 2 FROM DUAL UNION ALL
SELECT 'D', 'C13', 3 FROM DUAL UNION ALL
SELECT 'D', 'C14', 4 FROM DUAL UNION ALL
SELECT 'D', 'C15', 5 FROM DUAL UNION ALL
SELECT 'E', 'C16', 3 FROM DUAL UNION ALL
SELECT 'E', 'C17', 2 FROM DUAL UNION ALL
SELECT 'E', 'C18', 3 FROM DUAL UNION ALL
SELECT 'E', 'C19', 4 FROM DUAL UNION ALL
SELECT 'F', 'C20', 6 FROM DUAL UNION ALL
SELECT 'F', 'C21', 2 FROM DUAL;
COMMIT;
英文:
I have the below table (my input would be just columns NIF, FILE and FILESIZE).
I would like to obtain columns PARTITION and SUBPARTITION with the following rules:
- PARTITION should start with 1 and be incremented by 1 every tune SUBPARTITION reaches 8.
- A single NIF cannot be in two different PARTITIONs. (This is the most restrictive condition actually).
- SUBPARTITION must be incremented by 1 every time the running total in FILESIZE reaches 10. That running total must be reset when SUBPARTITION is incremented by 1.
Data:
NIF FILE FILESIZE PARTITION SUBPARTITION
----------------------------------------------
A C1 1 1 1
A C2 1 1 1
A C3 2 1 1
A C4 1 1 1
B C5 5 1 2
B C6 1 1 2
C C7 2 1 2
C C8 1 1 2
D C9 4 1 3
D C10 5 1 3
D C11 1 1 3
D C12 2 1 4
D C13 3 1 4
D C14 4 1 4
D C15 5 1 5
E C16 3 1 6
E C17 2 1 6
E C18 3 1 6
E C19 4 1 7
F C20 6 2 1
F C20 2 2 1
It does not need to be a single query, doing a PSLQL cursor would also be fine.
NOTE: this is the code to create the input table.
DROP TABLE my_table;
-- Create the table
CREATE TABLE my_table (
NIF VARCHAR2(10),
FILE_ VARCHAR2(10),
FILESIZE NUMBER
);
-- Insert the input data
INSERT INTO my_table (NIF, FILE_, FILESIZE)
SELECT 'A', 'C1', 1 FROM DUAL UNION ALL
SELECT 'A', 'C2', 1 FROM DUAL UNION ALL
SELECT 'A', 'C3', 2 FROM DUAL UNION ALL
SELECT 'A', 'C4', 1 FROM DUAL UNION ALL
SELECT 'B', 'C5', 5 FROM DUAL UNION ALL
SELECT 'B', 'C6', 1 FROM DUAL UNION ALL
SELECT 'C', 'C7', 2 FROM DUAL UNION ALL
SELECT 'C', 'C8', 1 FROM DUAL UNION ALL
SELECT 'D', 'C9', 4 FROM DUAL UNION ALL
SELECT 'D', 'C10', 5 FROM DUAL UNION ALL
SELECT 'D', 'C11', 1 FROM DUAL UNION ALL
SELECT 'D', 'C12', 2 FROM DUAL UNION ALL
SELECT 'D', 'C13', 3 FROM DUAL UNION ALL
SELECT 'D', 'C14', 4 FROM DUAL UNION ALL
SELECT 'D', 'C15', 5 FROM DUAL UNION ALL
SELECT 'E', 'C16', 3 FROM DUAL UNION ALL
SELECT 'E', 'C17', 2 FROM DUAL UNION ALL
SELECT 'E', 'C18', 3 FROM DUAL UNION ALL
SELECT 'E', 'C19', 4 FROM DUAL UNION ALL
SELECT 'F', 'C20', 6 FROM DUAL UNION ALL
SELECT 'F', 'C21', 2 FROM DUAL;
COMMIT;
答案1
得分: 1
从Oracle 12开始,您可以使用分析函数和MATCH_RECOGNIZE
的组合来执行逐行处理:
SELECT cto,
nif,
file_,
partition,
mn - first_mn + 1 AS subpartition,
total_size
FROM (
SELECT t.*,
ROW_NUMBER() OVER (
PARTITION BY cto
ORDER BY SUBSTR(file_, 1, 1), TO_NUMBER(SUBSTR(file_, 2))
) AS rn,
SUM(filesize) OVER (PARTITION BY nif) AS nif_size
FROM my_table t
)
MATCH_RECOGNIZE(
PARTITION BY cto
ORDER BY rn
MEASURES
MATCH_NUMBER() AS mn,
SUM(filesize) AS total_size
ALL ROWS PER MATCH
PATTERN ( (same_nif* nif)* same_nif* end_nif )
DEFINE
same_nif AS nif = NEXT(nif),
nif AS (NEXT(nif) IS NULL OR nif != NEXT(nif))
AND SUM(filesize) + NEXT(nif_size) <= 10,
end_nif AS SUM(filesize) <= 10
)
MATCH_RECOGNIZE(
PARTITION BY cto
ORDER BY rn
MEASURES
MATCH_NUMBER() AS partition,
FIRST(mn) AS first_mn
ALL ROWS PER MATCH
PATTERN ( (same_nif* nif)+ )
DEFINE
same_nif AS nif = NEXT(nif),
nif AS (NEXT(nif) IS NULL OR nif != NEXT(nif))
AND mn - FIRST(mn) < 7
);
在第一个匹配中,对于子分区,模式如下:
same_nif* nif
将匹配从当前行到当前NIF
分区的末尾,并确保下一个NIF
分区也完全包含在匹配中,通过检查当前大小加上下一个NIF
分区的总大小是否在限制范围内来实现;- 将其包装在
()*
中允许匹配零次或更多次(尽可能多); - 最后的
same_nif* end_nif
允许匹配部分匹配具有相同的nif
值的一组行(例如,文件C9、C10和C11与文件C12、C13和C14具有相同的nif
,但会分为不同的组,因为它们的文件大小总和超过了限制。没有不同的结束条件,您可能会将C1至C5分组,或者无法将D
nif
值拆分开。
在第二个匹配中,对于分区,需要将nif
值保持在同一匹配中(而对于子分区,nif
值可以跨多个匹配),除此之外,唯一需要检查的是一个组中没有超过7个子分区。
对于样本数据(添加了CTO
列和一个太大无法与前一个nif
组F
合并到同一分区的附加nif
组G
):
CREATE TABLE my_table (
CTO NUMBER,
NIF VARCHAR2(10),
FILE_ VARCHAR2(10),
FILESIZE NUMBER
);
INSERT INTO my_table (CTO, NIF, FILE_, FILESIZE)
SELECT 1, 'A', 'C1', 1 FROM DUAL UNION ALL
SELECT 1, 'A', 'C2', 1 FROM DUAL UNION ALL
SELECT 1, 'A', 'C3', 2 FROM DUAL UNION ALL
SELECT 1, 'A', 'C4', 1 FROM DUAL UNION ALL
SELECT 1, 'B', 'C5', 5 FROM DUAL UNION ALL
SELECT 1, 'B', 'C6', 1 FROM DUAL UNION ALL
SELECT 1, 'C', 'C7', 2 FROM DUAL UNION ALL
SELECT 1, 'C', 'C8', 1 FROM DUAL UNION ALL
SELECT 1, 'D', 'C9', 4 FROM DUAL UNION ALL
SELECT 1, 'D', 'C10', 5 FROM DUAL UNION ALL
SELECT 1, 'D', 'C11', 1 FROM DUAL UNION ALL
SELECT 1, 'D', 'C12', 2 FROM DUAL UNION ALL
SELECT 1, 'D', 'C13', 3 FROM DUAL UNION ALL
SELECT 1, 'D', 'C14', 4 FROM DUAL UNION ALL
SELECT 1, 'D', 'C15', 5 FROM DUAL UNION ALL
SELECT 1, 'E', 'C16', 3 FROM DUAL UNION ALL
SELECT 1, 'E', 'C17', 2 FROM DUAL UNION ALL
SELECT 1, 'E', 'C18', 3 FROM DUAL UNION ALL
SELECT 1, 'E', 'C19', 4 FROM DUAL UNION ALL
SELECT 1, 'F', 'C20', 6 FROM DUAL UNION ALL
SELECT 1, 'F', 'C21', 2 FROM DUAL UNION ALL
SELECT 1, 'G', 'C' || (LEVEL + 21), 10 FROM DUAL CONNECT BY LEVEL <= 7;
输出:
CTO | NIF | FILE_ | PARTITION | SUBPARTITION | TOTAL_SIZE |
---|---|---|---|---|---|
1 | A | C1 | 1 | 1 | 1 |
1 | A | C2 | 1 | 1 | 2 |
1 | A | C3 | 1 | 1 | 4 |
1 | A | C4 | 1 | 1 | 5 |
1 | B | C5 | 1 | 2 | 5 |
1 | B | C6 | 1 | 2 | 6 |
1 | C | C7 | 1 | 2 | 8 |
1 | C | C8 | 1 | 2 | 9 |
1 | D | C9 | 1 | 3 | 4 |
1 | D | C10 | 1 | 3 | 9 |
1 | D | C11 | 1 | 3 | 10 |
英文:
From Oracle 12, you can use a combination of analytic functions and MATCH_RECOGNIZE
to perform row-by-row processing:
SELECT cto,
nif,
file_,
partition,
mn - first_mn + 1 AS subpartition,
total_size
FROM (
SELECT t.*,
ROW_NUMBER() OVER (
PARTITION BY cto
ORDER BY SUBSTR(file_, 1, 1), TO_NUMBER(SUBSTR(file_, 2))
) AS rn,
SUM(filesize) OVER (PARTITION BY nif) AS nif_size
FROM my_table t
)
MATCH_RECOGNIZE(
PARTITION BY cto
ORDER BY rn
MEASURES
MATCH_NUMBER() AS mn,
SUM(filesize) AS total_size
ALL ROWS PER MATCH
PATTERN ( (same_nif* nif)* same_nif* end_nif )
DEFINE
same_nif AS nif = NEXT(nif),
nif AS (NEXT(nif) IS NULL OR nif != NEXT(nif))
AND SUM(filesize) + NEXT(nif_size) <= 10,
end_nif AS SUM(filesize) <= 10
)
MATCH_RECOGNIZE(
PARTITION BY cto
ORDER BY rn
MEASURES
MATCH_NUMBER() AS partition,
FIRST(mn) AS first_mn
ALL ROWS PER MATCH
PATTERN ( (same_nif* nif)+ )
DEFINE
same_nif AS nif = NEXT(nif),
nif AS (NEXT(nif) IS NULL OR nif != NEXT(nif))
AND mn - FIRST(mn) < 7
);
In first match, for the sub-partitions, the pattern:
same_nif* nif
will match from the current row to the end of the currentNIF
partition and ensure that the nextNIF
partition will also entirely be contained in the match by checking the current size plus the total size of the nextNIF
partition is within the limits;- Wrapping it in
()*
allows that to be matched zero-or-more times (as many as is possible); - The final
same_nif* end_nif
allows the match to partially match a set of rows with the samenif
values (such as the files C9, C10 and C11 which has the samenif
as C12, C13 and C14) but would be in a different group as they file size totals more than the limit. Without a different end condition you could end up either grouping C1 through C5 or not being able to split theD
nif
values.
In the second match, for the partitions, the nif
values need to be kept together in the same match (whereas for sub-partitions nif
values can span multiple matches) and, apart from that, the only check needed is that there are not more than 7 sub-partitions in a group.
Which, for the sample data (which has an added CTO
column and an additional nif
group G
that is too large to be combined in the same partition as the previous nif
group F
):
CREATE TABLE my_table (
CTO NUMBER,
NIF VARCHAR2(10),
FILE_ VARCHAR2(10),
FILESIZE NUMBER
);
INSERT INTO my_table (CTO, NIF, FILE_, FILESIZE)
SELECT 1, 'A', 'C1', 1 FROM DUAL UNION ALL
SELECT 1, 'A', 'C2', 1 FROM DUAL UNION ALL
SELECT 1, 'A', 'C3', 2 FROM DUAL UNION ALL
SELECT 1, 'A', 'C4', 1 FROM DUAL UNION ALL
SELECT 1, 'B', 'C5', 5 FROM DUAL UNION ALL
SELECT 1, 'B', 'C6', 1 FROM DUAL UNION ALL
SELECT 1, 'C', 'C7', 2 FROM DUAL UNION ALL
SELECT 1, 'C', 'C8', 1 FROM DUAL UNION ALL
SELECT 1, 'D', 'C9', 4 FROM DUAL UNION ALL
SELECT 1, 'D', 'C10', 5 FROM DUAL UNION ALL
SELECT 1, 'D', 'C11', 1 FROM DUAL UNION ALL
SELECT 1, 'D', 'C12', 2 FROM DUAL UNION ALL
SELECT 1, 'D', 'C13', 3 FROM DUAL UNION ALL
SELECT 1, 'D', 'C14', 4 FROM DUAL UNION ALL
SELECT 1, 'D', 'C15', 5 FROM DUAL UNION ALL
SELECT 1, 'E', 'C16', 3 FROM DUAL UNION ALL
SELECT 1, 'E', 'C17', 2 FROM DUAL UNION ALL
SELECT 1, 'E', 'C18', 3 FROM DUAL UNION ALL
SELECT 1, 'E', 'C19', 4 FROM DUAL UNION ALL
SELECT 1, 'F', 'C20', 6 FROM DUAL UNION ALL
SELECT 1, 'F', 'C21', 2 FROM DUAL UNION ALL
SELECT 1, 'G', 'C' || (LEVEL + 21), 10 FROM DUAL CONNECT BY LEVEL <= 7;
Outputs:
CTO | NIF | FILE_ | PARTITION | SUBPARTITION | TOTAL_SIZE |
---|---|---|---|---|---|
1 | A | C1 | 1 | 1 | 1 |
1 | A | C2 | 1 | 1 | 2 |
1 | A | C3 | 1 | 1 | 4 |
1 | A | C4 | 1 | 1 | 5 |
1 | B | C5 | 1 | 2 | 5 |
1 | B | C6 | 1 | 2 | 6 |
1 | C | C7 | 1 | 2 | 8 |
1 | C | C8 | 1 | 2 | 9 |
1 | D | C9 | 1 | 3 | 4 |
1 | D | C10 | 1 | 3 | 9 |
1 | D | C11 | 1 | 3 | 10 |
1 | D | C12 | 1 | 4 | 2 |
1 | D | C13 | 1 | 4 | 5 |
1 | D | C14 | 1 | 4 | 9 |
1 | D | C15 | 1 | 5 | 5 |
1 | E | C16 | 1 | 6 | 3 |
1 | E | C17 | 1 | 6 | 5 |
1 | E | C18 | 1 | 6 | 8 |
1 | E | C19 | 1 | 7 | 4 |
1 | F | C20 | 2 | 1 | 6 |
1 | F | C21 | 2 | 1 | 8 |
1 | G | C22 | 3 | 1 | 10 |
1 | G | C23 | 3 | 2 | 10 |
1 | G | C24 | 3 | 3 | 10 |
1 | G | C25 | 3 | 4 | 10 |
1 | G | C26 | 3 | 5 | 10 |
1 | G | C27 | 3 | 6 | 10 |
1 | G | C28 | 3 | 7 | 10 |
答案2
得分: 0
你可以创建一个临时表或一个真实的表,其中包含字段(NIF、FILE、FILESIZE、PARTITION、SUBPARTITION)。你也可以为它添加一个id,但是为了这个回答的目的,我只会描述绝对最低要求。
你可以将PARTITION和SUBPARTITION设置为可为空,并通过如下命令插入行:
insert into yourtable(NIF, FILE, FILESIZE)
values('A', 'C1', 1);
并实现一个在插入之前触发器,该触发器将查找max(PARTITION)
以找出当前分区,以及该分区中的max(SUBPARTITION)
和该分区中的sum(FILESIZE)
。类似于以下内容(这是一个通用的查询,你需要将字段into
保存到本地变量中):
select PARTITION, SUBPARTITION, sum(FILESIZE) storage_use
from yourtable
group by PARTITION, SUBPARTITION
order by PARTITION desc, SUBPARTITION desc
limit 1
从这里开始,在你的触发器内部,你可以查看storage_use
+要插入的记录的实际FILESIZE
是否大于10。如果不是,那么你将PARTITION
和SUBPARTITION
设置为查询中找到的值。如果不是,你可以递增SUBPARTITION
,只要它小于或等于8。如果SUBPARTITION
为8,并且storage_use
不允许你将新记录放入其中,那么将PARTITION + 1
和SUBPARTITION + 1
分别存储为PARTITION
和SUBPARTITION
。
英文:
You could create a temporary table or a real yable with the fields of (NIF, FILE, FILESIZE, PARTITION, SUBPARTITION). You could add an id to it as well, but for the purpose of this answer I will describe only the absolute minimum.
You would set PARTITION and SUBPARTITION to be nullable and insert lines via commands like
insert into yourtable(NIF, FILE, FILESIZE)
values('A', 'C1', 1);
and implement a before insert trigger that will find max(PARTITION)
to find out what the current partition is as well as the max(SUBPARTITION)
of that partition along with the sum(FILESIZE)
in that partition. Something like (this is an agnostic query, you will need to save the fields into
local variables)
select PARTITION, SUBPARTITION, sum(FILESIZE) storage_use
from yourtable
group by PARTITION, SUBPARTITION
order by PARTITION desc, SUBPARTITION desc
limit 1
From here on, inside your trigger you can see whether storage_use
+ the actual FILESIZE
of the record you are to insert is greater than 10. If not, then you set PARTITION
and SUBPARTITION
of your new record to those that you found in the query. If not, then you can increment SUBPARTITION
as long as it's lower than or equal with 8. If SUBPARTITION
is 8 and storage_use
does not allow you to fit the new record into it, then store PARTITION + 1
and SUBPARTITION + 1
as PARTITION
and SUBPARTITION
, respectively.
答案3
得分: 0
以下是代码的翻译部分:
您还可以使用MODEL子句:
select nif, file_, filesize, partition, subpartition
from (
select
row_number() over(order by nif, to_number(substr(file_,2))) as rn,
nif, file_, to_number(substr(file_,2)) as file_order,
filesize
from my_table
)
model
dimension by (rn)
measures( nif as nif, file_ as file_, file_order as file_order,
filesize as filesize,
0 as partition, 0 as incsubpartition, 1 as subpartition, 0 as running_total )
rules (
running_total[any] =
case when nvl(running_total[cv()-1],0) + filesize[cv()] >= 10
then filesize[cv()]
else nvl(running_total[cv()-1],0) + filesize[cv()]
end,
incsubpartition[any] =
case when nvl(running_total[cv()-1],0) + filesize[cv()] >= 10
then 1
else 0
end,
-- rule 3
subpartition[any] = nvl(subpartition[cv()-1],1) + incsubpartition[cv()],
partition[any] =
case when subpartition[cv()] = 8 and subpartition[cv()-1] <> 8
then partition[cv()-1] + 1
else nvl(partition[cv()-1],1)
end,
-- rule 1
subpartition[any] =
case when subpartition[cv()] = 8
then 1
else subpartition[cv()]
end,
-- rule 2
partition[any] =
case when nif[cv()] = nif[cv()-1]
then partition[cv()-1]
else partition[cv()]
end
)
order by nif, file_order
;
以下是数据的翻译部分:
NIF FILE_ FILESIZE PARTITION SUBPARTITION
A C1 1 1 1
A C2 1 1 1
A C3 2 1 1
A C4 1 1 1
B C5 5 1 2
B C6 1 1 2
C C7 2 1 2
C C8 1 1 2
D C9 4 1 3
D C10 5 1 3
D C11 1 1 4
D C12 2 1 4
D C13 3 1 4
D C14 4 1 5
D C15 5 1 5
E C16 3 1 6
E C17 2 1 6
E C18 3 1 6
E C19 4 1 7
F C20 6 2 1
F C21 2 2 1
英文:
You can also use MODEL clause:
select nif, file_, filesize, partition, subpartition
from (
select
row_number() over(order by nif, to_number(substr(file_,2))) as rn,
nif, file_, to_number(substr(file_,2)) as file_order,
filesize
from my_table
)
model
dimension by (rn)
measures( nif as nif, file_ as file_, file_order as file_order,
filesize as filesize,
0 as partition, 0 as incsubpartition, 1 as subpartition, 0 as running_total )
rules (
running_total[any] =
case when nvl(running_total[cv()-1],0) + filesize[cv()] >= 10
then filesize[cv()]
else nvl(running_total[cv()-1],0) + filesize[cv()]
end,
incsubpartition[any] =
case when nvl(running_total[cv()-1],0) + filesize[cv()] >= 10
then 1
else 0
end,
-- rule 3
subpartition[any] = nvl(subpartition[cv()-1],1) + incsubpartition[cv()],
partition[any] =
case when subpartition[cv()] = 8 and subpartition[cv()-1] <> 8
then partition[cv()-1] + 1
else nvl(partition[cv()-1],1)
end,
-- rule 1
subpartition[any] =
case when subpartition[cv()] = 8
then 1
else subpartition[cv()]
end,
-- rule 2
partition[any] =
case when nif[cv()] = nif[cv()-1]
then partition[cv()-1]
else partition[cv()]
end
)
order by nif, file_order
;
NIF FILE_ FILESIZE PARTITION SUBPARTITION
A C1 1 1 1
A C2 1 1 1
A C3 2 1 1
A C4 1 1 1
B C5 5 1 2
B C6 1 1 2
C C7 2 1 2
C C8 1 1 2
D C9 4 1 3
D C10 5 1 3
D C11 1 1 4
D C12 2 1 4
D C13 3 1 4
D C14 4 1 5
D C15 5 1 5
E C16 3 1 6
E C17 2 1 6
E C18 3 1 6
E C19 4 1 7
F C20 6 2 1
F C21 2 2 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论