从MongoDB迁移到PostgreSQL:为PostgreSQL编写聚合管道查询

huangapple go评论102阅读模式
英文:

Migrating from MongoDB to PostgreSQL: Writing aggregation pipeline query for PostgreSQL

问题

We have a requirement of migrating from MongoDB to PostgreSQL in our Node.js application. Our express APIs heavily use mongodb "aggregation" framework, which we have to re-write in PostgreSQL. For Example, we have below form of data:

  1. [{
  2. "product": "PD1",
  3. "application": "A",
  4. "project": "PR1",
  5. "version": "1"
  6. },{
  7. "product": "PD1",
  8. "application": "A",
  9. "project": "PR2",
  10. "version": "3"
  11. },{
  12. "product": "PD1",
  13. "application": "B",
  14. "project": "PR3",
  15. "version": "2"
  16. },{
  17. "product": "PD2",
  18. "application": "C",
  19. "project": "PR4",
  20. "version": "5"
  21. },{
  22. "product": "PD2",
  23. "application": "C",
  24. "project": "PR5",
  25. "version": "2"
  26. }]

It is guaranteed that an application belongs to only one product. Similarly, a project belongs to only one application. I have written below MongoDB aggregate query to summarize this data:

  1. let data = await CompanyModel.aggregate([{
  2. $group: {
  3. "_id": "$application",
  4. "projects":{
  5. $push: {
  6. "project": "$project",
  7. "version": "$version"
  8. }
  9. },
  10. "product": {$first: "$product"}
  11. }
  12. },{
  13. $group: {
  14. "_id": "$product",
  15. "applications":{
  16. $push: {
  17. "application": "$_id",
  18. "projects": "$projects"
  19. }
  20. },
  21. }
  22. }])

The above query outputs below result:

  1. [{
  2. "_id": "PD2",
  3. "applications": [{
  4. "application": "C",
  5. "projects": [{ "project": "PR4", "version": "5" }, { "project": "PR5", "version": "2" }]
  6. }]
  7. },
  8. {
  9. "_id": "PD1",
  10. "applications": [{
  11. "application": "B",
  12. "projects": [{ "project": "PR3", "version": "2" }]
  13. }, { "application": "A", "projects": [{ "project": "PR1", "version": "1" }, { "project": "PR2", "version": "3" }] }]
  14. }]

There is some hard requirement to structure data in this way only.

Now comes the main part, how do we write a query in PostgreSQL that acts like a pipeline just like above query, where output of one stage is input to other stage. I have some experience with SQL but have never worked on PostgreSQL. But I know that PostgreSQL supports JSON data. My guess is that in SQL, I would do something like below:

  1. SELECT * from (SELECT * from table_name group by field_name1) group by field_name2;

Basically I am trying to check if any such pipeline feature exist in PostgreSQL but couldn't find any.

I need some guidance on this. Thanks in advance!

英文:

We have a requirement of migrating from MongoDB to PostgreSQL in our Node.js application. Our express APIs heavily use mongodb aggregation framework, which we have to re-write in PostgreSQL. For Example, we have below form of data:

  1. [{
  2. "product": "PD1",
  3. "application": "A",
  4. "project": "PR1",
  5. "version": "1"
  6. },{
  7. "product": "PD1",
  8. "application": "A",
  9. "project": "PR2",
  10. "version": "3"
  11. },{
  12. "product": "PD1",
  13. "application": "B",
  14. "project": "PR3",
  15. "version": "2"
  16. },{
  17. "product": "PD2",
  18. "application": "C",
  19. "project": "PR4",
  20. "version": "5"
  21. },{
  22. "product": "PD2",
  23. "application": "C",
  24. "project": "PR5",
  25. "version": "2"
  26. }]

It is guaranteed that an application belongs to only one product. Similarly, a project belongs to only one application. I have written below MongoDB aggregate query to summarize this data:

  1. let data = await CompanyModel.aggregate([{
  2. $group: {
  3. "_id": "$application",
  4. "projects":{
  5. $push: {
  6. "project": "$project",
  7. "version": "$version"
  8. }
  9. },
  10. "product": {$first: "$product"}
  11. }
  12. },{
  13. $group: {
  14. "_id": "$product",
  15. "applications":{
  16. $push: {
  17. "application": "$_id",
  18. "projects": "$projects"
  19. }
  20. },
  21. }
  22. }])

The above query outputs below result:

  1. [{
  2. "_id": "PD2",
  3. "applications": [{
  4. "application": "C",
  5. "projects": [{ "project": "PR4", "version": "5" }, { "project": "PR5", "version": "2" }]
  6. }]
  7. },
  8. {
  9. "_id": "PD1",
  10. "applications": [{
  11. "application": "B",
  12. "projects": [{ "project": "PR3", "version": "2" }]
  13. }, { "application": "A", "projects": [{ "project": "PR1", "version": "1" }, { "project": "PR2", "version": "3" }] }]
  14. }]

There is some hard requirement to structure data in this way only.

Now comes the main part, how do we write a query in PostgreSQL that acts like a pipeline just like above query, where output of one stage is input to other stage. I have some experience with SQL but have never worked on PostgreSQL. But I know that PostgreSQL supports JSON data. My guess is that in SQl, I would do something like below:

  1. SELECT * from (SELECT * from table_name group by field_name1) group by field_name2;

Basically I am trying to check if any such pipeline feature exist in PostgreSQL but couldn't find any.

I need some guidance on this.
Thanks in advance!

答案1

得分: 0

I'll provide a translation of the code portion you provided:

  1. WITH app_projects AS (
  2. SELECT
  3. application,
  4. product,
  5. jsonb_agg(
  6. jsonb_build_object(
  7. 'project', project,
  8. 'version', version
  9. )
  10. ) AS projects
  11. FROM
  12. CompanyTable
  13. GROUP BY
  14. application,
  15. product
  16. ),
  17. prod_apps AS (
  18. SELECT
  19. product,
  20. jsonb_agg(
  21. jsonb_build_object(
  22. 'application', application,
  23. 'projects', projects
  24. )
  25. ) AS applications
  26. FROM
  27. app_projects
  28. GROUP BY
  29. product
  30. )
  31. SELECT
  32. product AS _id,
  33. applications
  34. FROM
  35. prod_apps;

这是你提供的SQL代码的翻译部分。如果你需要进一步的解释或有其他问题,请随时提问。

英文:
  1. WITH app_projects AS (
  2. SELECT
  3. application,
  4. product,
  5. jsonb_agg(
  6. jsonb_build_object(
  7. 'project', project,
  8. 'version', version
  9. )
  10. ) AS projects
  11. FROM
  12. CompanyTable
  13. GROUP BY
  14. application,
  15. product
  16. ),
  17. prod_apps AS (
  18. SELECT
  19. product,
  20. jsonb_agg(
  21. jsonb_build_object(
  22. 'application', application,
  23. 'projects', projects
  24. )
  25. ) AS applications
  26. FROM
  27. app_projects
  28. GROUP BY
  29. product
  30. )
  31. SELECT
  32. product AS _id,
  33. applications
  34. FROM
  35. prod_apps;

I used here CTE in postgresql. You can read more here, good explanation:
https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-cte/

  1. In app_projects CTE, for each application and product, I group projects and versions into a json array of objects using the jsonb_agg and jsonb_build_object functions.

  2. In prod_apps CTE, I group the results of the previous CTE by product and create a json array of objects for each product containing applications and their projects.

At the end, I select all the results from the prod_apps CTE to get the desired output.

jsonb_build_object - it's a function

  1. jsonb_build_object(
  2. 'project', project,
  3. 'version', version
  4. )

workes like this:

  1. {'project': project, 'version': version}

jsonb_agg - it's an expression
if you have some_value:

  1. some_value
  2. ------
  3. 1
  4. 2
  5. 3

jsonb_agg(some_value) will make:

  1. jsonb_agg
  2. ---------
  3. [1, 2, 3]

huangapple
  • 本文由 发表于 2023年6月26日 23:22:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76558088.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定