2023年2月14日 22:06:53go评论82阅读模式

英文:

Options to model JSON data in a Postgresql database?

问题

1- 提议的数据模型是否满足所需条件？我们有一个庞大的JSON数据文件（快速搜索员工技能，可扩展，轻松/快速查询和检索员工数据，例如员工ID）？

2- 在开发关系数据库架构时应考虑哪些因素？

3- 将数据拆分成多个表是否有优势？例如，一个用于员工个人数据，员工ID作为主键，一个用于技能，员工ID作为外键，以及一个JSON表用于其余数据。

我正在使用Windows 10上的PostgreSQL 15.1。我还在熟悉PostgreSQL数据库。

英文:

I have a JSON file with data about employees and their skills. I need to model the data somehow in a PostgreSQL database (and the reason is related to the application we are developing).

The JSON file has a lot of data that I don't really need for my application (at least for now). I only need a few columns: Employee ID, Name, Qualifications. But the rest of the data should be stored in the table (only temporarily, as this is still a POC).

Data



{
  &quot;employee&quot;: {
  &quot;ID&quot;: 654534543,,
  &quot;Name&quot;: &quot;Max Mustermann&quot;,
  &quot;Email&quot;: &quot;max.mustermann@firma.de&quot;,
  &quot;skills&quot;: [
    {&quot;name&quot;: python, &quot;level&quot;: 3},
    {&quot;name&quot;: c, &quot;level&quot;: 2},
    {&quot;name&quot;: openCV, &quot;level&quot;: 3}
    ],
  },
&quot;employee&quot;:{
  &quot;ID&quot;: 3213213,,
  &quot;Name&quot;: &quot;Alex Mustermann&quot;,
  &quot;Email&quot;: &quot;alex.mustermann@firma.de&quot;,
  &quot;skills&quot;:[
    {&quot;name&quot;: Jira, &quot;level&quot;: 3},
    {&quot;name&quot;: Git, &quot;level&quot;: 2},
    {&quot;name&quot;: Tensorflow, &quot;level&quot;: 3}
    ],
  }
};

I thought of creating a table with the columns: Employee ID as primary key, CHAR for the name, array for the skills and JSONB for the rest of the information about the employee.

TABLE

CREATE TABLE employee(
	id INT PRIMARY KEY,
	name VARCHAR(255) NOT NULL,
	position VARCHAR(255) NOT NULL,
	description VARCHAR (255),
        skills TEXT [],
        join_date DATE,      

);

Some factors to keep in mind: the data should be periodically updated (lets say once a month), the application should use the database to query one (or more) employee ID(s) who are covering certain required skill set(and skill levels). And so far we are not sure if we are going to query json fields (but could be possible in near future)

also, the data is complicated and dense (what I attached below is merely a simplified sample), so I guess querying directly from a JSONB column would not be convenient (as mentioned in other similar questions)

My questions now are:
1- Would the proposed data model meet the required conditions, we have a huge json data file (fast search for employee skills, scalable, easy/fast query and retrieval of employee data (for e.g employee id)?

2- What should be considered when developing a relational database schema?

3- Would there be advantages to splitting the data into multiple tables? e.g. one table for employee personal data with employee ID as primary key, one table for skills with employee ID as foreign key and a text field for skills, one JSON table for the rest of the data.

I am using PostgreSQL 15.1 on windows 10. I am also still getting familiar with PostgreSQL databases.

much thanks

答案1

得分: 2

以下是翻译后的代码部分：

这是我要做的事情：

创建表 employee (
id bigint not null primary key,
name text not null,
email text not null
);

创建表 skill (
id bigint generated always as identity primary key,
skill_name text not null unique
);

创建表 employee_skill (
id bigint generated always as identity primary key,
employee_id bigint not null references employee(id),
skill_id bigint not null references skill(id),
skill_level int not null,
unique (employee_id, skill_id)
);


然后，在纠正 JSON 中的错误后，用以下方式填充架构：
```sql
with indata as (
  select '[
  {
  "ID": 654534543,
  "Name": "Max Mustermann",
  "Email": "max.mustermann@firma.de",
  "skills": [
    {"name": "python", "level": 3},
    {"name": "c", "level": 2},
    {"name": "openCV", "level": 3}
    ]
  },
  {
  "ID": 3213213,
  "Name": "Alex Mustermann",
  "Email": "alex.mustermann@firma.de",
  "skills":[
    {"name": "Jira", "level": 3},
    {"name": "Git", "level": 2},
    {"name": "Tensorflow", "level": 3}
    ]
  }
]'::jsonb as j
), expand as (
  select emp, skill
    from indata
         cross join lateral jsonb_array_elements(j) as el(emp)
         cross join lateral jsonb_array_elements(emp->'skills') as sk(skill)
), insemp as (
  insert into employee (id, name, email)
  select distinct (emp->>'ID')::bigint, emp->>'Name', emp->>'Email'
    from expand
  on conflict (id) do update
    set name = excluded.name, email = excluded.email
  returning *
), insskill as (
  insert into skill (skill_name)
  select distinct skill->>'name'
    from expand
  on conflict (skill_name) do nothing
  returning *
), allemp as (
  select * from insemp union select * from employee
), allskill as (
  select * from insskill union select * from insskill
), insempskill as (
  insert into employee_skill (employee_id, skill_id, skill_level)
  select e.id as employee_id, s.id as skill_id, 
         (i.skill->>'level')::int as skill_level
    from expand i
         join allemp e on e.id = (i.emp->>'ID')::bigint
         join allskill s on s.skill_name = i.skill->>'name'
  on conflict (employee_id, skill_id) do update
    set skill_level = excluded.skill_level
  returning *
)
delete from employee_skill
 where (employee_id, skill_id) not in 
  (select employee_id, skill_id from insempskill 
    union 
   select employee_id, skill_id from employee_skill)
;

请查看工作示例


希望这对您有所帮助。如果您有任何其他问题，欢迎提出。

<details>
<summary>英文:</summary>

Here is what I would do:

create table employee (
id bigint not null primary key,
name text not null,
email text not null
);

create table skill (
id bigint generated always as identity primary key,
skill_name text not null unique
);

create table employee_skill (
id bigint generated always as identity primary key,
employee_id bigint not null references employee(id),
skill_id bigint not null references skill(id),
skill_level int not null,
unique (employee_id, skill_id)
);

Then, to populate the schema (after correcting the errors with the JSON):

with indata as (
select '[
{
"ID": 654534543,
"Name": "Max Mustermann",
"Email": "max.mustermann@firma.de",
"skills": [
{"name": "python", "level": 3},
{"name": "c", "level": 2},
{"name": "openCV", "level": 3}
]
},
{
"ID": 3213213,
"Name": "Alex Mustermann",
"Email": "alex.mustermann@firma.de",
"skills":[
{"name": "Jira", "level": 3},
{"name": "Git", "level": 2},
{"name": "Tensorflow", "level": 3}
]
}
]'::jsonb as j
), expand as (
select emp, skill
from indata
cross join lateral jsonb_array_elements(j) as el(emp)
cross join lateral jsonb_array_elements(emp->'skills') as sk(skill)
), insemp as (
insert into employee (id, name, email)
select distinct (emp->>'ID')::bigint, emp->>'Name', emp->>'Email'
from expand
on conflict (id) do update
set name = excluded.name, email = excluded.email
returning *
), insskill as (
insert into skill (skill_name)
select distinct skill->>'name'
from expand
on conflict (skill_name) do nothing
returning *
), allemp as (
select * from insemp union select * from employee
), allskill as (
select * from insskill union select * from insskill
), insempskill as (
insert into employee_skill (employee_id, skill_id, skill_level)
select e.id as employee_id, s.id as skill_id,
(i.skill->>'level')::int as skill_level
from expand i
join allemp e on e.id = (i.emp->>'ID')::bigint
join allskill s on s.skill_name = i.skill->>'name'
on conflict (employee_id, skill_id) do update
set skill_level = excluded.skill_level
returning *
)
delete from employee_skill
where (employee_id, skill_id) not in
(select employee_id, skill_id from insempskill
union
select employee_id, skill_id from employee_skill)
;

See [working fiddle][1]


  [1]: https://dbfiddle.uk/39WOyBPa

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Options to model JSON data in a Postgresql database?

问题

Data

TABLE

答案1

需要使用pgrouting以经纬度形式提供的路径。

将 Helm 中的映射类型数据转换为 JSON 对象。

将JSON解析为map。

使用 Docker Desktop 中的 PGAdmin 扩展来连接 PostgreSQL 镜像时出现连接问题。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论