从MongoDB迁移到PostgreSQL:为PostgreSQL编写聚合管道查询

huangapple go评论71阅读模式
英文:

Migrating from MongoDB to PostgreSQL: Writing aggregation pipeline query for PostgreSQL

问题

We have a requirement of migrating from MongoDB to PostgreSQL in our Node.js application. Our express APIs heavily use mongodb "aggregation" framework, which we have to re-write in PostgreSQL. For Example, we have below form of data:

[{
    "product": "PD1",
    "application": "A",
    "project": "PR1",
    "version": "1"
},{
    "product": "PD1",
    "application": "A",
    "project": "PR2",
    "version": "3"
},{
    "product": "PD1",
    "application": "B",
    "project": "PR3",
    "version": "2"
},{
    "product": "PD2",
    "application": "C",
    "project": "PR4",
    "version": "5"
},{
    "product": "PD2",
    "application": "C",
    "project": "PR5",
    "version": "2"
}]

It is guaranteed that an application belongs to only one product. Similarly, a project belongs to only one application. I have written below MongoDB aggregate query to summarize this data:

let data = await CompanyModel.aggregate([{
    $group: {
        "_id": "$application",
        "projects":{
            $push: {
                "project": "$project",
                "version": "$version"
            }
        },
        "product": {$first: "$product"}
    }
},{
    $group: {
        "_id": "$product",
        "applications":{
            $push: {
                "application": "$_id",
                "projects": "$projects"
            }
        },
    }
}])

The above query outputs below result:

[{
    "_id": "PD2",
    "applications": [{
        "application": "C",
        "projects": [{ "project": "PR4", "version": "5" }, { "project": "PR5", "version": "2" }]
    }]
},
{
    "_id": "PD1",
    "applications": [{
        "application": "B",
        "projects": [{ "project": "PR3", "version": "2" }]
    }, { "application": "A", "projects": [{ "project": "PR1", "version": "1" }, { "project": "PR2", "version": "3" }] }]
}]

There is some hard requirement to structure data in this way only.

Now comes the main part, how do we write a query in PostgreSQL that acts like a pipeline just like above query, where output of one stage is input to other stage. I have some experience with SQL but have never worked on PostgreSQL. But I know that PostgreSQL supports JSON data. My guess is that in SQL, I would do something like below:

SELECT * from (SELECT * from table_name group by field_name1) group by field_name2;

Basically I am trying to check if any such pipeline feature exist in PostgreSQL but couldn't find any.

I need some guidance on this. Thanks in advance!

英文:

We have a requirement of migrating from MongoDB to PostgreSQL in our Node.js application. Our express APIs heavily use mongodb aggregation framework, which we have to re-write in PostgreSQL. For Example, we have below form of data:

[{
    "product": "PD1",
    "application": "A",
    "project": "PR1",
    "version": "1"
},{
    "product": "PD1",
    "application": "A",
    "project": "PR2",
    "version": "3"
},{
    "product": "PD1",
    "application": "B",
    "project": "PR3",
    "version": "2"
},{
    "product": "PD2",
    "application": "C",
    "project": "PR4",
    "version": "5"
},{
    "product": "PD2",
    "application": "C",
    "project": "PR5",
    "version": "2"
}]

It is guaranteed that an application belongs to only one product. Similarly, a project belongs to only one application. I have written below MongoDB aggregate query to summarize this data:

let data = await CompanyModel.aggregate([{
    $group: {
        "_id": "$application",
        "projects":{
            $push: {
                "project": "$project",
                "version": "$version"
            }
        },
        "product": {$first: "$product"}
    }
},{
    $group: {
        "_id": "$product",
        "applications":{
            $push: {
                "application": "$_id",
                "projects": "$projects"
            }
        },
    }
}])

The above query outputs below result:

[{
    "_id": "PD2",
    "applications": [{
        "application": "C",
        "projects": [{ "project": "PR4", "version": "5" }, { "project": "PR5", "version": "2" }]
    }]
},
{
    "_id": "PD1",
    "applications": [{
        "application": "B",
        "projects": [{ "project": "PR3", "version": "2" }]
    }, { "application": "A", "projects": [{ "project": "PR1", "version": "1" }, { "project": "PR2", "version": "3" }] }]
}]

There is some hard requirement to structure data in this way only.

Now comes the main part, how do we write a query in PostgreSQL that acts like a pipeline just like above query, where output of one stage is input to other stage. I have some experience with SQL but have never worked on PostgreSQL. But I know that PostgreSQL supports JSON data. My guess is that in SQl, I would do something like below:

SELECT * from (SELECT * from table_name group by field_name1) group by field_name2;

Basically I am trying to check if any such pipeline feature exist in PostgreSQL but couldn't find any.

I need some guidance on this.
Thanks in advance!

答案1

得分: 0

I'll provide a translation of the code portion you provided:

WITH app_projects AS (
  SELECT 
    application,
    product,
    jsonb_agg(
      jsonb_build_object(
        'project', project, 
        'version', version
      )
    ) AS projects
  FROM 
    CompanyTable
  GROUP BY 
    application, 
    product
),
prod_apps AS (
  SELECT 
    product,
    jsonb_agg(
      jsonb_build_object(
        'application', application, 
        'projects', projects
      )
    ) AS applications
  FROM 
    app_projects
  GROUP BY 
    product
)
SELECT 
  product AS _id,
  applications
FROM 
  prod_apps;

这是你提供的SQL代码的翻译部分。如果你需要进一步的解释或有其他问题,请随时提问。

英文:
WITH app_projects AS (
  SELECT 
    application,
    product,
    jsonb_agg(
      jsonb_build_object(
        'project', project, 
        'version', version
      )
    ) AS projects
  FROM 
    CompanyTable
  GROUP BY 
    application, 
    product
),
prod_apps AS (
  SELECT 
    product,
    jsonb_agg(
      jsonb_build_object(
        'application', application, 
        'projects', projects
      )
    ) AS applications
  FROM 
    app_projects
  GROUP BY 
    product
)
SELECT 
  product AS _id,
  applications
FROM 
  prod_apps;

I used here CTE in postgresql. You can read more here, good explanation:
https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-cte/

  1. In app_projects CTE, for each application and product, I group projects and versions into a json array of objects using the jsonb_agg and jsonb_build_object functions.

  2. In prod_apps CTE, I group the results of the previous CTE by product and create a json array of objects for each product containing applications and their projects.

At the end, I select all the results from the prod_apps CTE to get the desired output.

jsonb_build_object - it's a function

jsonb_build_object(
    'project', project, 
    'version', version
)

workes like this:

{'project': project, 'version': version}

jsonb_agg - it's an expression
if you have some_value:

some_value
------
1
2
3

jsonb_agg(some_value) will make:

jsonb_agg
---------
[1, 2, 3]

huangapple
  • 本文由 发表于 2023年6月26日 23:22:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76558088.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定