英文:
Options to model JSON data in a Postgresql database?
问题
1- 提议的数据模型是否满足所需条件?我们有一个庞大的JSON数据文件(快速搜索员工技能,可扩展,轻松/快速查询和检索员工数据,例如员工ID)?
2- 在开发关系数据库架构时应考虑哪些因素?
3- 将数据拆分成多个表是否有优势?例如,一个用于员工个人数据,员工ID作为主键,一个用于技能,员工ID作为外键,以及一个JSON表用于其余数据。
我正在使用Windows 10上的PostgreSQL 15.1。我还在熟悉PostgreSQL数据库。
英文:
I have a JSON file with data about employees and their skills. I need to model the data somehow in a PostgreSQL database (and the reason is related to the application we are developing).
The JSON file has a lot of data that I don't really need for my application (at least for now). I only need a few columns: Employee ID, Name, Qualifications. But the rest of the data should be stored in the table (only temporarily, as this is still a POC).
Data
{
"employee": {
"ID": 654534543,,
"Name": "Max Mustermann",
"Email": "max.mustermann@firma.de",
"skills": [
{"name": python, "level": 3},
{"name": c, "level": 2},
{"name": openCV, "level": 3}
],
},
"employee":{
"ID": 3213213,,
"Name": "Alex Mustermann",
"Email": "alex.mustermann@firma.de",
"skills":[
{"name": Jira, "level": 3},
{"name": Git, "level": 2},
{"name": Tensorflow, "level": 3}
],
}
};
I thought of creating a table with the columns: Employee ID as primary key, CHAR for the name, array for the skills and JSONB for the rest of the information about the employee.
TABLE
CREATE TABLE employee(
id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
position VARCHAR(255) NOT NULL,
description VARCHAR (255),
skills TEXT [],
join_date DATE,
);
Some factors to keep in mind: the data should be periodically updated (lets say once a month), the application should use the database to query one (or more) employee ID(s) who are covering certain required skill set(and skill levels). And so far we are not sure if we are going to query json fields (but could be possible in near future)
also, the data is complicated and dense (what I attached below is merely a simplified sample), so I guess querying directly from a JSONB column would not be convenient (as mentioned in other similar questions)
My questions now are:
1- Would the proposed data model meet the required conditions, we have a huge json data file (fast search for employee skills, scalable, easy/fast query and retrieval of employee data (for e.g employee id)?
2- What should be considered when developing a relational database schema?
3- Would there be advantages to splitting the data into multiple tables? e.g. one table for employee personal data with employee ID as primary key, one table for skills with employee ID as foreign key and a text field for skills, one JSON table for the rest of the data.
I am using PostgreSQL 15.1 on windows 10. I am also still getting familiar with PostgreSQL databases.
much thanks
答案1
得分: 2
以下是翻译后的代码部分:
这是我要做的事情:
创建表 employee (
id bigint not null primary key,
name text not null,
email text not null
);
创建表 skill (
id bigint generated always as identity primary key,
skill_name text not null unique
);
创建表 employee_skill (
id bigint generated always as identity primary key,
employee_id bigint not null references employee(id),
skill_id bigint not null references skill(id),
skill_level int not null,
unique (employee_id, skill_id)
);
然后,在纠正 JSON 中的错误后,用以下方式填充架构:
```sql
with indata as (
select '[
{
"ID": 654534543,
"Name": "Max Mustermann",
"Email": "max.mustermann@firma.de",
"skills": [
{"name": "python", "level": 3},
{"name": "c", "level": 2},
{"name": "openCV", "level": 3}
]
},
{
"ID": 3213213,
"Name": "Alex Mustermann",
"Email": "alex.mustermann@firma.de",
"skills":[
{"name": "Jira", "level": 3},
{"name": "Git", "level": 2},
{"name": "Tensorflow", "level": 3}
]
}
]'::jsonb as j
), expand as (
select emp, skill
from indata
cross join lateral jsonb_array_elements(j) as el(emp)
cross join lateral jsonb_array_elements(emp->'skills') as sk(skill)
), insemp as (
insert into employee (id, name, email)
select distinct (emp->>'ID')::bigint, emp->>'Name', emp->>'Email'
from expand
on conflict (id) do update
set name = excluded.name, email = excluded.email
returning *
), insskill as (
insert into skill (skill_name)
select distinct skill->>'name'
from expand
on conflict (skill_name) do nothing
returning *
), allemp as (
select * from insemp union select * from employee
), allskill as (
select * from insskill union select * from insskill
), insempskill as (
insert into employee_skill (employee_id, skill_id, skill_level)
select e.id as employee_id, s.id as skill_id,
(i.skill->>'level')::int as skill_level
from expand i
join allemp e on e.id = (i.emp->>'ID')::bigint
join allskill s on s.skill_name = i.skill->>'name'
on conflict (employee_id, skill_id) do update
set skill_level = excluded.skill_level
returning *
)
delete from employee_skill
where (employee_id, skill_id) not in
(select employee_id, skill_id from insempskill
union
select employee_id, skill_id from employee_skill)
;
请查看 工作示例
希望这对您有所帮助。如果您有任何其他问题,欢迎提出。
<details>
<summary>英文:</summary>
Here is what I would do:
create table employee (
id bigint not null primary key,
name text not null,
email text not null
);
create table skill (
id bigint generated always as identity primary key,
skill_name text not null unique
);
create table employee_skill (
id bigint generated always as identity primary key,
employee_id bigint not null references employee(id),
skill_id bigint not null references skill(id),
skill_level int not null,
unique (employee_id, skill_id)
);
Then, to populate the schema (after correcting the errors with the JSON):
with indata as (
select '[
{
"ID": 654534543,
"Name": "Max Mustermann",
"Email": "max.mustermann@firma.de",
"skills": [
{"name": "python", "level": 3},
{"name": "c", "level": 2},
{"name": "openCV", "level": 3}
]
},
{
"ID": 3213213,
"Name": "Alex Mustermann",
"Email": "alex.mustermann@firma.de",
"skills":[
{"name": "Jira", "level": 3},
{"name": "Git", "level": 2},
{"name": "Tensorflow", "level": 3}
]
}
]'::jsonb as j
), expand as (
select emp, skill
from indata
cross join lateral jsonb_array_elements(j) as el(emp)
cross join lateral jsonb_array_elements(emp->'skills') as sk(skill)
), insemp as (
insert into employee (id, name, email)
select distinct (emp->>'ID')::bigint, emp->>'Name', emp->>'Email'
from expand
on conflict (id) do update
set name = excluded.name, email = excluded.email
returning *
), insskill as (
insert into skill (skill_name)
select distinct skill->>'name'
from expand
on conflict (skill_name) do nothing
returning *
), allemp as (
select * from insemp union select * from employee
), allskill as (
select * from insskill union select * from insskill
), insempskill as (
insert into employee_skill (employee_id, skill_id, skill_level)
select e.id as employee_id, s.id as skill_id,
(i.skill->>'level')::int as skill_level
from expand i
join allemp e on e.id = (i.emp->>'ID')::bigint
join allskill s on s.skill_name = i.skill->>'name'
on conflict (employee_id, skill_id) do update
set skill_level = excluded.skill_level
returning *
)
delete from employee_skill
where (employee_id, skill_id) not in
(select employee_id, skill_id from insempskill
union
select employee_id, skill_id from employee_skill)
;
See [working fiddle][1]
[1]: https://dbfiddle.uk/39WOyBPa
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论