2023年2月18日 09:21:23go评论65阅读模式

英文:

Convert SQLite column data into rows in related tables

问题

我有一个SQLite数据库，目前只有一个表 - 它是从一个csv文件中导入的。两列是分号分隔的类别或标签，导入为TEXT字段。典型的一行可能是这样的：

```1 | 文章标题 | 摄影;我的作品 | 小贴士;镜头;装备 | 在这篇文章中我将讨论...```

我该如何提取类别和标签列，将它们唯一地插入到它们各自的表中，然后创建一个关系表将它们全部联系在一起？因此最终结果将类似于：

内容表
1 | 文章标题 | 摄影;我的作品 | 小贴士;镜头;装备 | 在这篇文章中我将讨论...

类别表
1 | 摄影
2 | 我的作品

内容类别关联表
1 | 1 | 1
2 | 2 | 1


这将有效地将我的单表数据库转换为一个真正的关系数据库。
我希望这可以高效快速地完成，因为将会在非常庞大的行数上使用这个解决方案。
这个解决方案需要与SQLite 3.36或更高版本兼容。

英文:

I have a SQLite database that is currently just one table - it has been imported from a csv file. Two of the columns are semicolon separated lists of either categories or tags imported as TEXT fields. A typical row might look like this:

1 | Article Title | photography;my work | tips;lenses;gear | In this article I'll talk about...

How can I extract the category and tags columns, uniquely insert them into their own respective tables, and then create a relational table to tie them all together? So the end result would be something like:

Content
1 | Article Title | photography;my work | tips;lenses;gear | In this article I will talk about...    

Categories   
1 | photography
2 | my work    

ContentCategories    
1 | 1 | 1    
2 | 2 | 1

This would effectively convert my one table database into a truly relational database.
I'm hoping this can be done both efficiently and quickly as there is a very large number of rows this solution would be used on.
This solution needs to be compatible with SQLite version 3.36 or later.

答案1

得分: 1

我相信以下内容展示了如何实现这一点。但这是一个两阶段的过程，仅适用于类别。类似的两阶段流程可以用于其他列。

表/列名可能不同。

/* 创建演示环境 */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;
CREATE TABLE IF NOT EXISTS content (content_id INTEGER PRIMARY KEY, title TEXT, categories TEXT);
INSERT INTO content (title, categories) VALUES
('Article1', '摄影;我的作品;一些东西;另一个;无聊'),
('Article2', '摄影;他们的作品;不是某物;不是另一个;不是无聊'),
('Article3', 'A;B;C;D;E;F;G;;');
CREATE TABLE IF NOT EXISTS category (category_id INTEGER PRIMARY KEY, category_name TEXT UNIQUE);
CREATE TABLE IF NOT EXISTS contentcategories (content_id_map, category_id_map, PRIMARY KEY (content_id_map, category_id_map));

/* 阶段 1 填充类别表 /
WITH
sep AS (SELECT ';'), / 值分隔符 /
justincase AS (SELECT 100), / 迭代次数限制 /
splt(value, rest) AS
(
SELECT
substr(categories, 1, instr(categories, (SELECT * FROM sep)) - 1),
substr(categories, instr(categories, (SELECT * FROM sep)) + 1) || (SELECT * FROM sep)
FROM content
UNION ALL SELECT
substr(rest, 1, instr(rest, (SELECT * FROM sep)) - 1),
substr(rest, instr(rest, (SELECT * FROM sep)) + 1)
FROM splt
WHERE length(rest) > 0
LIMIT (SELECT * FROM justincase) / 限制迭代次数，以防万一 /
)
INSERT OR IGNORE INTO category (category_name) SELECT value FROM splt WHERE length(value) > 0;
/ 显示结果类别表 */
SELECT * FROM category;

/* 阶段 2 填充内容类别映射表 /
WITH
sep AS (SELECT ';'), / 值分隔符 /
justincase AS (SELECT 100), / 迭代次数限制 /
splt(value, rest, contentid, categoryid) AS
(
SELECT
substr(categories, 1, instr(categories, (SELECT * FROM sep)) - 1),
substr(categories, instr(categories, (SELECT * FROM sep)) + 1) || (SELECT * FROM sep),
content_id,
(SELECT category_id FROM category WHERE category_name = substr(categories, 1, instr(categories, (SELECT * FROM sep)) - 1))
FROM content
UNION ALL SELECT
substr(rest, 1, instr(rest, (SELECT * FROM sep)) - 1),
substr(rest, instr(rest, (SELECT * FROM sep)) + 1),
contentid,
(SELECT category_id FROM category WHERE category_name = substr(rest, 1, instr(rest, (SELECT * FROM sep)) - 1))
FROM splt
WHERE length(rest) > 0
LIMIT (SELECT * FROM justincase) / 限制迭代次数，以防万一 /
)
INSERT OR IGNORE INTO contentcategories SELECT contentid, categoryid FROM splt WHERE length(value) > 0;
/ 显示经过映射表与类别表连接的内容结果 /
SELECT content., category.*
FROM content
JOIN contentcategories ON content_id = content_id_map JOIN category ON category_id_map = category_id
;

/* 清理演示环境 */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;

因此，内容表有三行，每行都有不同数量的类别。

第一阶段使用递归来分割值并丢弃分隔符（分隔符被编码为一个公共表达式，所以可以被传递，同样迭代次数限制也可以被传递，因为它是一个公共表达式）。
随后的 CTE（splt）用于 SELECT INSERT 来使用提取/分割后的类别加载新的类别表（使用 OR IGNORE 忽略任何重复项，比如摄影）。

第二阶段再次分割值，这次从新的类别表获取类别的 id，以便加载映射表 contentcategories。

每个阶段之后都使用 SELECT 显示阶段的结果（这些仅供演示）。

因此，当上述内容被运行时，

第一个结果（在加载类别表后）为：第一个结果

第二个结果为：第二个结果

即一切都像预期的那样通过连接提取了（尽管没有彻底检查）。

注意，错误的 ;;，即分隔符之间没有值的部分被 WHERE length(value) > 0 丢弃。

英文:

I believe that the following demonstrates how this can be done. However it is a 2 stage process and just for the categories. Similar two stage processes could be used for other columns.

Table/column names may differ.

/* Create Demo Environment */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;
CREATE TABLE IF NOT EXISTS content (content_id INTEGER PRIMARY KEY,title TEXT, categories TEXT);
INSERT INTO content (title,categories) VALUES
	(&#39;Article1&#39;,&#39;photography;my work;something;another;blah&#39;),
	(&#39;Article2&#39;,&#39;photography;thier work;not something;not another;not blah&#39;),
	(&#39;Article3&#39;,&#39;A;B;C;D;E;F;G;;&#39;);
CREATE TABLE IF NOT EXISTS category (category_id INTEGER PRIMARY KEY,category_name TEXT UNIQUE);
CREATE TABLE IF NOT EXISTS contentcategories (content_id_map,category_id_map, PRIMARY KEY (content_id_map,category_id_map));

/* Stage 1 populate the category table */
WITH
	sep AS (SELECT &#39;;&#39;), /* The value separator */
	justincase AS (SELECT 100), /* limiter for the number of iterations */
	splt(value,rest) AS 
		(
			SELECT 
				substr(categories,1,instr(categories,(SELECT * FROM sep))-1),
				substr(categories,instr(categories,(SELECT * FROM sep))+1)||(SELECT * FROM sep) 
			FROM content
			UNION ALL SELECT 
				substr(rest,1,instr(rest,(SELECT * FROM sep))-1),
				substr(rest,instr(rest,(SELECT * FROM sep))+1) 
			FROM splt 
			WHERE length(rest) &gt; 0 
			LIMIT (SELECT * FROM justincase) /* just in case limit iterations*/
	) 
INSERT OR IGNORE INTO category (category_name) SELECT value FROM splt WHERE length(value) &gt; 0;
/* Show the resulktant Category table */
SELECT * FROM category;

/* Stage 2 populate the contentcategories mapping table */
WITH 
	sep AS (SELECT &#39;;&#39;), /* The value separator */
	justincase AS (SELECT 100), /* limiter for the number of iterations */
	splt(value,rest,contentid,categoryid) AS 
		(
			SELECT 
				substr(categories,1,instr(categories,(SELECT * FROM sep))-1),
				substr(categories,instr(categories,(SELECT * FROM sep))+1)||(SELECT * FROM sep),
				content_id,
				(SELECT category_id FROM category WHERE category_name = substr(categories,1,instr(categories,(SELECT * FROM sep))-1))
			FROM content
			UNION ALL SELECT 
				substr(rest,1,instr(rest,(SELECT * FROM sep))-1),
				substr(rest,instr(rest,(SELECT * FROM sep))+1),
				contentid,
				(SELECT category_id FROM category WHERE category_name = substr(rest,1,instr(rest,(SELECT * FROM sep))-1))
			FROM splt 
			WHERE length(rest) &gt; 0 
			LIMIT (SELECT * FROM justincase) /* just in case limit iterations */
	)
INSERT OR IGNORE INTO contentcategories SELECT contentid,categoryid FROM splt WHERE length(value) &gt; 0;
/* Show the result of content joined via the mapping table with the category table */
SELECT content.*,category.* 
	FROM content 
	JOIN contentcategories ON  content_id = content_id_map JOIN category ON category_id_map = category_id
;
/* Cleanup Demo Environment */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;

So the content table has three rows each with a varying number of categories.

The first Stage uses recursion to split the values dropping the separators (the separator is coded as a CTE just the once so could be passed, like wise a value to limit the number of recursions can also be passed as it is a CTE).
The resulting CTE (splt) is then used for a SELECT INSERT to load the new category table with the extracted/split categories (OR IGNORE used to ignore any duplicates such as photography).

The second stage then splits the values again this time getting the id of the category from the new category table so that the mapping table contentcategories can be loaded.

After each stage a SELECT is used to show the result of the stage (these are included just to demonstrate).

So when the above is run then,

The **first result&& (after loading the category table) is:-

The second result is :-

i.e. everything is extracted via the joins as expected (not thoroughly checked though).

note that the erroneous ;; i.e. no value between the separators is discarded by WHERE length(value) > 0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将SQLite列数据转换为相关表中的行

问题

答案1

将SQL数据按单行分组打印，避免重复行。

数据库在Jetpack Compose桌面上

Prisma – 将属性的类型设置为枚举的数组

应用程序在从SQLite数据库访问数据后崩溃，安卓。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论