英文:
Reduce the size of a table with a multi-column index
问题
我有一个只有4列但超过10亿行的表。假设这些列分别命名为a、b、c和d,我的应用程序需要基于(a, b, c)或(a, b)进行数据过滤。为了满足这个需求,我按照(a, b, c)的顺序在这些列上创建了一个索引。然而,这导致了表的大小翻倍增长。我猜想原因是a、b和c列都同时被添加到数据和索引中。有人能建议如何减小这个表的大小吗?
我的表结构:
CREATE TABLE Message (
userId bigint NOT NULL,
campaignId int NOT NULL,
notificationId int NOT NULL,
isOpened bit NOT NULL
);
我需要通过(userId, campaignId, notificationId)和(userId, campaignId)来筛选数据。
英文:
I have a table with only 4 columns and over 1 billion rows. Suppose these columns are named a, b, c, and d, my application requires filtering of data based on (a, b, c) or (a, b). To accommodate this, I created an index on the columns (a, b, c) in that order. However, this resulted in the table size doubling. I guessed the reason is because the a, b, and c columns are added to both data and indexes. Can anyone suggest any solutions to reduce the size of this table?
My table schema:
CREATE TABLE Message (
userId bigint NOT NULL,
campaignId int NOT NULL,
notificationId int NOT NULL,
isOpened bit NOT NULL
);
I need to filter data by (userId, campaignId, notificationId) and (userId, campaignId)
答案1
得分: 3
创建一个聚集索引,如果表上没有聚集索引,使得索引叶子节点成为数据行。这样可以避免重复存储键和包含的列。如果这些值是唯一的,还可以指定UNIQUE
以提高执行计划的质量。
CREATE CLUSTERED INDEX cdx_YourTable ON dbo.YourTable(a, b, c);
您还可以进一步减小大小,使用页面或行压缩,如下所示。虽然这将导致额外的CPU开销,但通常可以通过减少IO和改进相同数据的缓冲区高效性来弥补成本。
CREATE CLUSTERED INDEX cdx_YourTable ON dbo.YourTable(a, b, c)
WITH(DATA_COMPRESSION=PAGE);
英文:
Assuming you don't already have a clustered index on the table, create the index as clustered so that the index leaf nodes are the data rows. This will avoid storing the key and included columns redundantly. Also specify UNIQUE
if the values are unique to improve execution plan quality.
CREATE CLUSTERED INDEX cdx_YourTable ON dbo.YourTable(a, b, c);
You can reduce size further with PAGE or ROW compression like below. Although this will incur additional CPU overhead, the cost is often more than offset with less IO and improved buffer cache efficiency for the same data.
CREATE CLUSTERED INDEX cdx_YourTable ON dbo.YourTable(a, b, c)
WITH(DATA_COMPRESSION=PAGE);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论