创建一个在Redshift中排序的表。

huangapple go评论65阅读模式
英文:

Create a sorted table in Redshift

问题

我想创建一个表,该表将根据一个列自动排序,但将null值放在最后。

我认为可以使用sortkey来实现,但在查询表时,它仍然按照填充的顺序返回:

英文:

I would like to create a table that will be automatically sorted by a column, but with null values last.

I thought i could do it with sortkey like this, but when querying the table it still comes back just in the order it was filled:

create temp table temp (
    id integer,
    date date sortkey ,
    month integer,
    cost integer);

insert into temp values
    (1, '2022-09-01', 1, 5),
    (3, '2023-01-01', 2, 3),
    (1, '2022-09-01', 2, 20),
    (2, '2022-11-01', 3, 26),
    (1, null, 3, 33),
    (1, null, 4, 39),
    (1, '2022-09-01', 6, 60),
    (1, null, 7, 72),
    (2, '2022-11-01', 1, 11),
    (2, null, 2, 14),
    (2, '2022-11-01', 4, 55),
    (2, '2022-11-01', 5, 99),
    (3, null, 1, 1),
    (1, '2022-09-01', 5, 44),
    (3, '2023-01-01', 3, 4);

.

答案1

得分: 1

First off Redshift is a cluster and portions of the table are stored on each “slice” of the cluster. So at some level these slices aren’t “sorted” with respect to each other. You don’t specify the size of the cluster you are using so it is not clear how many slices are in play. When the data comes back it will be in the order in which the slices return their data.

Next, the data stored on Redshift’s disks are only resorted on a VACUUM command. Vacuum rewrites all the blocks of disk storage with blocks that are in sort order. So if you didn’t Vacuum the table no change will happen.

Lastly Redshift will skip the sort phase of vacuum if less than the threshold number of rows are unsorted - 95% by default. So you will need to vacuum “TO 100 PERCENT” to be sure that the entire table is sorted by vacuum.

英文:

First off Redshift is a cluster and portions of the table are stored on each “slice” of the cluster. So at some level these slices aren’t “sorted” with respect to each other. You don’t specify the size of the cluster you are using so it is not clear how many slices are in play. When the data comes back it will be in the order in which the slices return their data.

Next, the data stored on Redshift’s disks are only resorted on a VACUUM command. Vacuum rewrites all the blocks of disk storage with blocks that are in sort order. So if you didn’t Vacuum the table no change will happen.

Lastly Redshift will skip the sort phase of vacuum if less than the threshold number of rows are unsorted - 95% by default. So you will need to vacuum “TO 100 PERCENT” to be sure that the entire table is sorted by vacuum.

答案2

得分: 0

我是一名Redshift专家。

如果您刚开始使用Redshift,这个PDF可能会对您有兴趣;

https://www.redshiftresearchproject.org/white_papers/downloads/introduction_to_the_fundamentals_of_amazon_redshift.pdf

英文:

I am a Redshift specialist.

If you're just starting with Redshift, this PDF may be of interest;

https://www.redshiftresearchproject.org/white_papers/downloads/introduction_to_the_fundamentals_of_amazon_redshift.pdf

huangapple
  • 本文由 发表于 2023年3月21日 00:27:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75792882.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定