将SQLite列数据转换为相关表中的行

huangapple go评论65阅读模式
英文:

Convert SQLite column data into rows in related tables

问题

我有一个SQLite数据库目前只有一个表 - 它是从一个csv文件中导入的两列是分号分隔的类别或标签导入为TEXT字段典型的一行可能是这样的

```1 | 文章标题 | 摄影;我的作品 | 小贴士;镜头;装备 | 在这篇文章中我将讨论...```

我该如何提取类别和标签列将它们唯一地插入到它们各自的表中然后创建一个关系表将它们全部联系在一起因此最终结果将类似于

内容表
1 | 文章标题 | 摄影;我的作品 | 小贴士;镜头;装备 | 在这篇文章中我将讨论...

类别表
1 | 摄影
2 | 我的作品

内容类别关联表
1 | 1 | 1
2 | 2 | 1


这将有效地将我的单表数据库转换为一个真正的关系数据库。
我希望这可以高效快速地完成,因为将会在非常庞大的行数上使用这个解决方案。
这个解决方案需要与SQLite 3.36或更高版本兼容。
英文:

I have a SQLite database that is currently just one table - it has been imported from a csv file. Two of the columns are semicolon separated lists of either categories or tags imported as TEXT fields. A typical row might look like this:

1 | Article Title | photography;my work | tips;lenses;gear | In this article I'll talk about...

How can I extract the category and tags columns, uniquely insert them into their own respective tables, and then create a relational table to tie them all together? So the end result would be something like:

Content
1 | Article Title | photography;my work | tips;lenses;gear | In this article I will talk about...    

Categories   
1 | photography
2 | my work    

ContentCategories    
1 | 1 | 1    
2 | 2 | 1

This would effectively convert my one table database into a truly relational database.
I'm hoping this can be done both efficiently and quickly as there is a very large number of rows this solution would be used on.
This solution needs to be compatible with SQLite version 3.36 or later.

答案1

得分: 1

我相信以下内容展示了如何实现这一点。但这是一个两阶段的过程,仅适用于类别。类似的两阶段流程可以用于其他列。

表/列名可能不同。

/* 创建演示环境 */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;
CREATE TABLE IF NOT EXISTS content (content_id INTEGER PRIMARY KEY, title TEXT, categories TEXT);
INSERT INTO content (title, categories) VALUES
('Article1', '摄影;我的作品;一些东西;另一个;无聊'),
('Article2', '摄影;他们的作品;不是某物;不是另一个;不是无聊'),
('Article3', 'A;B;C;D;E;F;G;;');
CREATE TABLE IF NOT EXISTS category (category_id INTEGER PRIMARY KEY, category_name TEXT UNIQUE);
CREATE TABLE IF NOT EXISTS contentcategories (content_id_map, category_id_map, PRIMARY KEY (content_id_map, category_id_map));

/* 阶段 1 填充类别表 /
WITH
sep AS (SELECT ';'), /
值分隔符 /
justincase AS (SELECT 100), /
迭代次数限制 /
splt(value, rest) AS
(
SELECT
substr(categories, 1, instr(categories, (SELECT * FROM sep)) - 1),
substr(categories, instr(categories, (SELECT * FROM sep)) + 1) || (SELECT * FROM sep)
FROM content
UNION ALL SELECT
substr(rest, 1, instr(rest, (SELECT * FROM sep)) - 1),
substr(rest, instr(rest, (SELECT * FROM sep)) + 1)
FROM splt
WHERE length(rest) > 0
LIMIT (SELECT * FROM justincase) /
限制迭代次数,以防万一 /
)
INSERT OR IGNORE INTO category (category_name) SELECT value FROM splt WHERE length(value) > 0;
/
显示结果类别表 */
SELECT * FROM category;

/* 阶段 2 填充内容类别映射表 /
WITH
sep AS (SELECT ';'), /
值分隔符 /
justincase AS (SELECT 100), /
迭代次数限制 /
splt(value, rest, contentid, categoryid) AS
(
SELECT
substr(categories, 1, instr(categories, (SELECT * FROM sep)) - 1),
substr(categories, instr(categories, (SELECT * FROM sep)) + 1) || (SELECT * FROM sep),
content_id,
(SELECT category_id FROM category WHERE category_name = substr(categories, 1, instr(categories, (SELECT * FROM sep)) - 1))
FROM content
UNION ALL SELECT
substr(rest, 1, instr(rest, (SELECT * FROM sep)) - 1),
substr(rest, instr(rest, (SELECT * FROM sep)) + 1),
contentid,
(SELECT category_id FROM category WHERE category_name = substr(rest, 1, instr(rest, (SELECT * FROM sep)) - 1))
FROM splt
WHERE length(rest) > 0
LIMIT (SELECT * FROM justincase) /
限制迭代次数,以防万一 /
)
INSERT OR IGNORE INTO contentcategories SELECT contentid, categoryid FROM splt WHERE length(value) > 0;
/
显示经过映射表与类别表连接的内容结果 /
SELECT content.
, category.*
FROM content
JOIN contentcategories ON content_id = content_id_map JOIN category ON category_id_map = category_id
;

/* 清理演示环境 */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;

因此,内容表有三行,每行都有不同数量的类别。

第一阶段使用递归来分割值并丢弃分隔符(分隔符被编码为一个公共表达式,所以可以被传递,同样迭代次数限制也可以被传递,因为它是一个公共表达式)。
随后的 CTE(splt)用于 SELECT INSERT 来使用提取/分割后的类别加载新的类别表(使用 OR IGNORE 忽略任何重复项,比如摄影)。

第二阶段再次分割值,这次从新的类别表获取类别的 id,以便加载映射表 contentcategories。

每个阶段之后都使用 SELECT 显示阶段的结果(这些仅供演示)。

因此,当上述内容被运行时,

第一个结果(在加载类别表后)为:第一个结果

第二个结果为:第二个结果

即一切都像预期的那样通过连接提取了(尽管没有彻底检查)。

  • 注意,错误的 ;;,即分隔符之间没有值的部分被 WHERE length(value) > 0 丢弃。
英文:

I believe that the following demonstrates how this can be done. However it is a 2 stage process and just for the categories. Similar two stage processes could be used for other columns.

Table/column names may differ.

/* Create Demo Environment */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;
CREATE TABLE IF NOT EXISTS content (content_id INTEGER PRIMARY KEY,title TEXT, categories TEXT);
INSERT INTO content (title,categories) VALUES
	('Article1','photography;my work;something;another;blah'),
	('Article2','photography;thier work;not something;not another;not blah'),
	('Article3','A;B;C;D;E;F;G;;');
CREATE TABLE IF NOT EXISTS category (category_id INTEGER PRIMARY KEY,category_name TEXT UNIQUE);
CREATE TABLE IF NOT EXISTS contentcategories (content_id_map,category_id_map, PRIMARY KEY (content_id_map,category_id_map));

/* Stage 1 populate the category table */
WITH
	sep AS (SELECT ';'), /* The value separator */
	justincase AS (SELECT 100), /* limiter for the number of iterations */
	splt(value,rest) AS 
		(
			SELECT 
				substr(categories,1,instr(categories,(SELECT * FROM sep))-1),
				substr(categories,instr(categories,(SELECT * FROM sep))+1)||(SELECT * FROM sep) 
			FROM content
			UNION ALL SELECT 
				substr(rest,1,instr(rest,(SELECT * FROM sep))-1),
				substr(rest,instr(rest,(SELECT * FROM sep))+1) 
			FROM splt 
			WHERE length(rest) > 0 
			LIMIT (SELECT * FROM justincase) /* just in case limit iterations*/
	) 
INSERT OR IGNORE INTO category (category_name) SELECT value FROM splt WHERE length(value) > 0;
/* Show the resulktant Category table */
SELECT * FROM category;

/* Stage 2 populate the contentcategories mapping table */
WITH 
	sep AS (SELECT ';'), /* The value separator */
	justincase AS (SELECT 100), /* limiter for the number of iterations */
	splt(value,rest,contentid,categoryid) AS 
		(
			SELECT 
				substr(categories,1,instr(categories,(SELECT * FROM sep))-1),
				substr(categories,instr(categories,(SELECT * FROM sep))+1)||(SELECT * FROM sep),
				content_id,
				(SELECT category_id FROM category WHERE category_name = substr(categories,1,instr(categories,(SELECT * FROM sep))-1))
			FROM content
			UNION ALL SELECT 
				substr(rest,1,instr(rest,(SELECT * FROM sep))-1),
				substr(rest,instr(rest,(SELECT * FROM sep))+1),
				contentid,
				(SELECT category_id FROM category WHERE category_name = substr(rest,1,instr(rest,(SELECT * FROM sep))-1))
			FROM splt 
			WHERE length(rest) > 0 
			LIMIT (SELECT * FROM justincase) /* just in case limit iterations */
	)
INSERT OR IGNORE INTO contentcategories SELECT contentid,categoryid FROM splt WHERE length(value) > 0;
/* Show the result of content joined via the mapping table with the category table */
SELECT content.*,category.* 
	FROM content 
	JOIN contentcategories ON  content_id = content_id_map JOIN category ON category_id_map = category_id
;
/* Cleanup Demo Environment */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;

So the content table has three rows each with a varying number of categories.

The first Stage uses recursion to split the values dropping the separators (the separator is coded as a CTE just the once so could be passed, like wise a value to limit the number of recursions can also be passed as it is a CTE).
The resulting CTE (splt) is then used for a SELECT INSERT to load the new category table with the extracted/split categories (OR IGNORE used to ignore any duplicates such as photography).

The second stage then splits the values again this time getting the id of the category from the new category table so that the mapping table contentcategories can be loaded.

After each stage a SELECT is used to show the result of the stage (these are included just to demonstrate).

So when the above is run then,

The **first result&& (after loading the category table) is:-

将SQLite列数据转换为相关表中的行

The second result is :-

将SQLite列数据转换为相关表中的行

i.e. everything is extracted via the joins as expected (not thoroughly checked though).

  • note that the erroneous ;; i.e. no value between the separators is discarded by WHERE length(value) > 0

huangapple
  • 本文由 发表于 2023年2月18日 09:21:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75490590.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定