英文:
Database Design: Best approach for handling hierarchical data insertion in SQL database
问题
I have translated the requested content:
我有一个包含Campaign(广告系列)、AdGroup(广告组)和AdGroupAd(广告组广告)等层次表的SQL数据库,每个表通过外键关系引用前一个表。将数据插入这些表变得复杂,因为我需要确保插入的正确顺序以维护数据完整性。我正在考虑两种方法,并寻求关于最佳方法的建议:
-
按正确顺序插入记录:这涉及手动按层次顺序将数据插入每个表,确保维护所需的关系。例如,我首先会插入一个广告系列,然后插入引用广告系列的广告组,最后插入引用广告组的广告。
-
使用带有触发器的入口表:这种方法涉及创建一个名为LoadingZone的额外表,充当数据的入口点。LoadingZone表上的触发器将根据提供的数据自动按正确顺序将记录插入到相应的表中。
我已经提供了上述两种情况的小型SQL示例。我将感激有关每种方法的优缺点以及根据数据完整性、性能和维护等因素,关于在SQL数据库中处理层次数据插入的最佳方法的指导。哪种方法更适合处理SQL数据库中的层次数据插入?
请注意,示例是实际数据模型的简化版本。
数据库:
CREATE TABLE Campaign (
id INT PRIMARY KEY,
campaign_name VARCHAR(50)
);
CREATE TABLE AdGroup (
id INT PRIMARY KEY,
ad_group_name VARCHAR(50),
campaign_id INT,
FOREIGN KEY (campaign_id) REFERENCES Campaign(id)
);
CREATE TABLE AdGroupAd (
id INT PRIMARY KEY,
ad_name VARCHAR(50),
ad_group_id INT,
FOREIGN KEY (ad_group_id) REFERENCES AdGroup(id)
);
情景1:按正确顺序插入记录:
-- 插入广告系列
INSERT INTO Campaign (id, campaign_name) VALUES (1, 'International Stuff');
-- 为广告系列插入广告组
INSERT INTO AdGroup (id, ad_group_name, campaign_id) VALUES (1, 'Regional 1', 1);
-- 为广告组插入广告
INSERT INTO AdGroupAd (id, ad_name, ad_group_id) VALUES (1, 'Blue', 1);
情景2:使用带有触发器的入口表
-- 创建加载表
CREATE TABLE LoadingZone (
id INT PRIMARY KEY,
campaign_name VARCHAR(50),
ad_group_name VARCHAR(50),
ad_name VARCHAR(50)
);
-- 创建触发器函数
CREATE OR REPLACE FUNCTION insert_into_tables() RETURNS TRIGGER AS $$
BEGIN
-- 插入到Campaign表
INSERT INTO Campaign (id, campaign_name)
SELECT NEW.id, NEW.campaign_name
ON CONFLICT DO NOTHING;
-- 插入到AdGroup表
INSERT INTO AdGroup (id, ad_group_name, campaign_id)
SELECT NEW.id, NEW.ad_group_name, c.id
FROM Campaign c
WHERE c.campaign_name = NEW.campaign_name
ON CONFLICT DO NOTHING;
-- 插入到AdGroupAd表
INSERT INTO AdGroupAd (id, ad_name, ad_group_id)
SELECT NEW.id, NEW.ad_name, ag.id
FROM AdGroup ag
WHERE ag.ad_group_name = NEW.ad_group_name
ON CONFLICT DO NOTHING;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- 创建触发器
CREATE TRIGGER insert_trigger
AFTER INSERT ON LoadingZone
FOR EACH ROW
EXECUTE FUNCTION insert_into_tables();
插入数据到LoadingZone表:
INSERT INTO LoadingZone (id, campaign_name, ad_group_name, ad_name)
VALUES (1, 'International Stuff', 'Regional 1', 'Blue');
以下是我对触发器的优点的看法:
- 自我记录:基于触发器的方法自我记录了正确的插入顺序,使理解表之间的关系更容易。
- 安全性强:如果由于子表中的错误而导致插入失败,数据库应该回滚整个插入。否则,这种情况可能更难以恢复。
- 更强的数据完整性:触发器强制执行正确的插入顺序,降低了数据不一致性的风险,并确保更强的数据完整性。
感谢在这个话题上的任何指导。
英文:
I have a SQL database with hierarchical tables, including Campaign, AdGroup, and AdGroupAd, where each table references the previous one through foreign key relationships. Inserting data into these tables becomes complex as I need to ensure the correct order of insertion to maintain data integrity. I'm considering two approaches and seeking advice on the best approach:
-
Insert records in the correct order: This involves manually inserting data into each table in the hierarchical order, ensuring that the required relationships are maintained. For example, I would first insert a campaign, then an ad group referencing the campaign, and finally an ad referencing the ad group.
-
Use an entry table with a trigger: This approach involves creating an additional table called LoadingZone, acting as an entry point for the data. A trigger on the LoadingZone table would handle the insertion process by automatically inserting records into the respective tables in the correct order, based on the data provided.
I have provided small SQL examples for both scenarios above. I would appreciate insights into the pros and cons of each approach and guidance on the best approach considering factors such as data integrity, performance, and maintenance. Which approach is more suitable for handling hierarchical data insertion in a SQL database?
Note that the examples are simplified versions of the real data model.
Database:
CREATE TABLE Campaign (
id INT PRIMARY KEY,
campaign_name VARCHAR(50)
);
CREATE TABLE AdGroup (
id INT PRIMARY KEY,
ad_group_name VARCHAR(50),
campaign_id INT,
FOREIGN KEY (campaign_id) REFERENCES Campaign(id)
);
CREATE TABLE AdGroupAd (
id INT PRIMARY KEY,
ad_name VARCHAR(50),
ad_group_id INT,
FOREIGN KEY (ad_group_id) REFERENCES AdGroup(id)
);
Scenario 1: Insert records in the correct order:
-- Insert a campaign
INSERT INTO Campaign (id, campaign_name) VALUES (1, 'International Stuff');
-- Insert an ad group for the campaign
INSERT INTO AdGroup (id, ad_group_name, campaign_id) VALUES (1, 'Regional 1', 1);
-- Insert an ad for the ad group
INSERT INTO AdGroupAd (id, ad_name, ad_group_id) VALUES (1, 'Blue', 1);
Scenario 2: Use an entry table with a trigger
-- Create the loading table
CREATE TABLE LoadingZone (
id INT PRIMARY KEY,
campaign_name VARCHAR(50),
ad_group_name VARCHAR(50),
ad_name VARCHAR(50)
);
-- Create the trigger function
CREATE OR REPLACE FUNCTION insert_into_tables() RETURNS TRIGGER AS $$
BEGIN
-- Insert into Campaign table
INSERT INTO Campaign (id, campaign_name)
SELECT NEW.id, NEW.campaign_name
ON CONFLICT DO NOTHING;
-- Insert into AdGroup table
INSERT INTO AdGroup (id, ad_group_name, campaign_id)
SELECT NEW.id, NEW.ad_group_name, c.id
FROM Campaign c
WHERE c.campaign_name = NEW.campaign_name
ON CONFLICT DO NOTHING;
-- Insert into AdGroupAd table
INSERT INTO AdGroupAd (id, ad_name, ad_group_id)
SELECT NEW.id, NEW.ad_name, ag.id
FROM AdGroup ag
WHERE ag.ad_group_name = NEW.ad_group_name
ON CONFLICT DO NOTHING;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- Create the trigger
CREATE TRIGGER insert_trigger
AFTER INSERT ON LoadingZone
FOR EACH ROW
EXECUTE FUNCTION insert_into_tables();
-- Insert data into the LoadingZone table
INSERT INTO LoadingZone (id, campaign_name, ad_group_name, ad_name)
VALUES (1, 'International Stuff', 'Regional 1', 'Blue');
These are my thoughts in favor of the trigger:
- Self-documentation: The trigger-based approach self-documents the correct insertion order, making it easier to understand the relationships between the tables.
- Fail-safe: If an insertion fails due to an error thrown in one of the child tables, the database should roll back the entire insert. Otherwise, this scenario could be more difficult to recover from.
- Stronger data integrity: The trigger enforces the correct order of insertion, reducing the risk of data inconsistencies and ensuring stronger data integrity.
Thanks for any guidance on this topic.
答案1
得分: 1
触发器很有用,但它们也是不可见的、复杂的,并且可能会妨碍执行触发器作者未设想的操作。仅为触发器添加一个表会增加更多的复杂性。
相反,依赖约束条件,如foreign key
和not null
,以确保引用完整性; 使插入无效行变得不可能。
-
您的外键将防止引用不存在的项目。这将捕获如果您尝试按错误顺序插入的情况。
-
声明所有外键为
not null
。这将防止插入没有父项的行。 -
使用自增主键以确保您始终获得唯一的ID。这也使得您明显地知道您正在按错误顺序插入;您需要按正确的顺序插入以获取外键。它还避免了与其他同时插入的连接发生竞态条件。
-
使用
last_insert_id
(仅适用于MySQL)来获取上一个自增插入的ID。 -
如果有任何失败,使用事务回滚所有插入。
关于您的模式的其他说明。
-
对于主键,请使用bigints。ID在每次尝试插入时都会增加。尝试插入20亿次会很快。
-
不要随意限制文本字段的大小。
varchar
是可变字符大小;您不会节省任何空间,只会限制您的用户。
示例代码如下:
CREATE TABLE Campaign (
id bigint PRIMARY KEY AUTO_INCREMENT,
campaign_name VARCHAR(255) not null
);
CREATE TABLE AdGroup (
id bigint PRIMARY KEY AUTO_INCREMENT,
ad_group_name VARCHAR(255) not null,
campaign_id bigint not null,
FOREIGN KEY (campaign_id) REFERENCES Campaign(id)
);
CREATE TABLE AdGroupAd (
id bigint PRIMARY KEY AUTO_INCREMENT,
ad_name VARCHAR(255) not null,
ad_group_id bigint not null,
FOREIGN KEY (ad_group_id) REFERENCES AdGroup(id)
);
begin;
INSERT INTO Campaign (campaign_name) VALUES ('International Stuff');
-- 使用 `select last_insert_id()` 获取Campaign的ID。
INSERT INTO AdGroup (ad_group_name, campaign_id) VALUES ('Regional 1', (select last_insert_id()));
-- 使用 `select last_insert_id()` 获取AdGroupAd的ID。
INSERT INTO AdGroupAd (ad_name, ad_group_id) VALUES ('Blue', (select last_insert_id()));
commit;
BEGIN;
INSERT INTO Campaign (campaign_name) VALUES ('Buy our stuff');
INSERT INTO AdGroup (ad_group_name, campaign_id) VALUES ('Stuff sellers', (select last_insert_id()));
-- 错误的插入,这将引发错误。
INSERT INTO AdGroupAd (ad_name, ad_group_id) VALUES ('Stuff', 1234567);
-- 回滚所有插入。
ROLLBACK;
演示。
注意:MySQL的事务处理不一致。"InnoDB有时仅回滚失败的语句,其他时候会回滚整个事务。" 🤦 这就是为什么需要显式的rollback
。其他数据库将在错误发生时自动回滚整个事务。MySQL非常不标准,会教会您一些不良习惯,我建议您切换到PostgreSQL,特别是在学习过程中。
您可以编写一个小程序来执行插入,或者如SalmanA建议的,使用存储过程,但这不是必需的。
英文:
Triggers are useful, but they are also invisible, complicated, and can get in the way of doing things not envisioned by the trigger author. Adding a table just for the trigger adds even more complexity.
Instead, rely on constraints, such as foreign key
and not null
, to ensure referential integrity; make it impossible to insert invalid rows.
-
Your foreign keys will prevent referencing an item which does not exist. This will catch if you try to insert in the wrong order.
-
Declare all the foreign keys
not null
. This prevents inserting a row with no parent. -
Use auto-increment for primary keys to ensure you always get a unique id. This also makes it obvious you're inserting in the wrong order; you need to insert in the right order to get the foreign keys. It also avoids race conditions with other connections inserting at the same time.
-
Use
last_insert_id
(MySQL-specific) to get the ID of the previous auto-incremented insert. -
Use transactions to rollback all the inserts if any fail.
Other notes on your schema.
-
Use bigints for primary keys. The ID is incremented every time an insert is attempted. 2 billion attempted inserts comes up fast.
-
Don't arbitrarily limit the size of text fields.
varchar
is VARiable character size; you're not saving any space and just limiting your users.
CREATE TABLE Campaign (
id bigint PRIMARY KEY AUTO_INCREMENT,
campaign_name VARCHAR(255) not null
);
CREATE TABLE AdGroup (
id bigint PRIMARY KEY AUTO_INCREMENT,
ad_group_name VARCHAR(255) not null,
campaign_id bigint not null,
FOREIGN KEY (campaign_id) REFERENCES Campaign(id)
);
CREATE TABLE AdGroupAd (
id bigint PRIMARY KEY AUTO_INCREMENT,
ad_name VARCHAR(255) not null,
ad_group_id bigint not null,
FOREIGN KEY (ad_group_id) REFERENCES AdGroup(id)
);
begin;
INSERT INTO Campaign (campaign_name) VALUES ('International Stuff');
-- Use `select last_insert_id()` to get the ID of the Campaign.
INSERT INTO AdGroup (ad_group_name, campaign_id) VALUES ('Regional 1', (select last_insert_id()));
-- Use `select last_insert_id()` to get the ID of the AdGroupAd.
INSERT INTO AdGroupAd (ad_name, ad_group_id) VALUES ('Blue', (select last_insert_id()));
commit;
BEGIN;
INSERT INTO Campaign (campaign_name) VALUES ('Buy our stuff');
INSERT INTO AdGroup (ad_group_name, campaign_id) VALUES ('Stuff sellers', (select last_insert_id()));
-- Bad insert, this will raise an error.
INSERT INTO AdGroupAd (ad_name, ad_group_id) VALUES ('Stuff', 1234567);
-- Roll back all the inserts.
ROLLBACK;
Note: MySQL's transaction handling is inconsistent. "InnoDB sometimes rolls back only the statement that failed, other times it rolls back the entire transaction." 🤦 That's why the explicit rollback
is necessary. Other databases will rollback the whole transaction on error automatically. MySQL is quite non-standard and will teach you bad habits, I advocate you switch to PostgreSQL, especially while learning.
You can write a small program to do the inserts, or as SalmanA suggested, a stored procedure, but it shouldn't be necessary.
答案2
得分: 1
你还可以使用视图和INSTEAD OF TRIGGER,就像这样:
CREATE TABLE Campaign (
id INT IDENTITY PRIMARY KEY,
campaign_name VARCHAR(50) NOT NULL
);
CREATE TABLE AdGroup (
id INT IDENTITY PRIMARY KEY,
ad_group_name VARCHAR(50) NOT NULL,
campaign_id INT NOT NULL,
FOREIGN KEY (campaign_id) REFERENCES Campaign(id)
);
CREATE TABLE AdGroupAd (
id INT IDENTITY PRIMARY KEY,
ad_name VARCHAR(50) NOT NULL,
ad_group_id INT NOT NULL,
FOREIGN KEY (ad_group_id) REFERENCES AdGroup(id)
);
GO
CREATE VIEW V_campaign_group
AS
SELECT C.id AS campaigne_id, C.campaign_name,
AG.id AS AdGroup_id, AG.ad_group_name,
AGA.id AS AdGroupAd_id, AGA.ad_name
FROM Campaign AS C
LEFT OUTER JOIN AdGroup AS AG
ON AG.campaign_id = C.id
LEFT OUTER JOIN AdGroupAd AS AGA
ON AGA.ad_group_id = AG.id;
GO
CREATE TRIGGER EVT_INSERT_V_campaign_group
ON V_campaign_group
INSTEAD OF INSERT
AS
INSERT INTO Campaign
SELECT campaign_name
FROM inserted;
INSERT INTO AdGroup
SELECT ad_group_name, SCOPE_IDENTITY()
FROM inserted
INSERT INTO AdGroupAd
SELECT ad_name, SCOPE_IDENTITY()
FROM inserted
GO
此代码特定于MS SQL Server,但您可以在其他关系数据库管理系统中找到等效的代码。
英文:
You can also use a view an a INSTEAD OF TRIGGER, like this one :
CREATE TABLE Campaign (
id INT IDENTITY PRIMARY KEY,
campaign_name VARCHAR(50) NOT NULL
);
CREATE TABLE AdGroup (
id INT IDENTITY PRIMARY KEY,
ad_group_name VARCHAR(50) NOT NULL,
campaign_id INT NOT NULL,
FOREIGN KEY (campaign_id) REFERENCES Campaign(id)
);
CREATE TABLE AdGroupAd (
id INT IDENTITY PRIMARY KEY,
ad_name VARCHAR(50) NOT NULL,
ad_group_id INT NOT NULL,
FOREIGN KEY (ad_group_id) REFERENCES AdGroup(id)
);
GO
CREATE VIEW V_campaign_group
AS
SELECT C.id AS campaigne_id, C.campaign_name,
AG.id AS AdGroup_id, AG.ad_group_name,
AGA.id AS AdGroupAd_id, AGA.ad_name
FROM Campaign AS C
LEFT OUTER JOIN AdGroup AS AG
ON AG.campaign_id = C.id
LEFT OUTER JOIN AdGroupAd AS AGA
ON AGA.ad_group_id = AG.id;
GO
CREATE TRIGGER EVT_INSERT_V_campaign_group
ON V_campaign_group
INSTEAD OF INSERT
AS
INSERT INTO Campaign
SELECT campaign_name
FROM inserted;
INSERT INTO AdGroup
SELECT ad_group_name, SCOPE_IDENTITY()
FROM inserted
INSERT INTO AdGroupAd
SELECT ad_name, SCOPE_IDENTITY()
FROM inserted
GO
This code is specific to MS SQL Server but you will find equivalent in other RDBMS
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论