How to query entire table(3 tables with 100k records each) data and then display data from them in dashboard

huangapple go评论72阅读模式
英文:

How to query entire table(3 tables with 100k records each) data and then display data from them in dashboard

问题

我有3个模型,Postgresql数据库 -

  ##user.rb - :full_name
  has_many :enquiries

  ##enquiry.rb - :actual_target_count, :actual_visit_count
  ##还有不同类型的询问 - 商务、正式、电话、电子邮件等
  belongs_to :user

  ##sales_projection.rb - :target_month, desired_target_count, desired_visit_count
  belongs_to :user
  belongs_to :enquiry

问题陈述 - 如果我有500多个用户为当前月份(2023年7月)提供了不同类型的询问(60,000个条目)的目标,如何获得特定月份的预测,以显示所有用户的期望与实际目标。

让我在下面更详细地解释一下 -

用户表中的记录(500多个条目)

{id: 1, full_name: "迈克"}
{id: 2, full_name: "杰森"}
{id: 3, full_name: "蕾米"}
{id: 4, full_name: "山姆"}

询问表中的记录(500多个条目),每个询问可以有多个用户的记录

迈克参与了5个询问,迈克参与了20个询问,依此类推...

{id: 1, user_id: 1, actual_target_coun: 30, actual_visit_count: 40}
{id: 2, user_id: 1, actual_target_coun: 10, actual_visit_count: 05}
{id: 3, user_id: 1, actual_target_coun: 20, actual_visit_count: 10}

销售预测表中的记录(50,000多个条目),每个销售预测记录都包括用户的预测(实际与期望目标计数)以及选择的月份(这里是2023年7月)。

{id: 1, user_id: 1, desired_target_coun: 30, desired_visit_count: 40, target_month: "2023年7月"}
{id: 2, user_id: 2, desired_target_coun: 10, desired_visit_count: 05, target_month: "2023年6月"}
{id: 3, user_id: 3, desired_target_coun: 20, desired_visit_count: 10, target_month: "2023年7月"}
{id: 4, user_id: 4, desired_target_coun: 50, desired_visit_count: 15, target_month: "2023年3月"}

基于上述记录集,如何显示所有用户及其所选月份的期望/实际计数的列表?

这是我想要显示的:

用户 | 期望目标 | 实际目标 | 期望访问 | 实际访问 | 目标月份

使用ActiveRecord(我目前正在使用的)不可能实现这一点,因为这将非常耗时并增加页面加载时间。

目前我有:

  ##伪代码 How to query entire table(3 tables with 100k records each) data and then display data from them in dashboard
  SalesProjection.where(target_month: "2023年7月").order(target_month: :asc).group_by { |m| m.user.id }.map do |key, value|
  {
   获取用户全名,
   使用连接获取用户期望计数
   使用连接获取用户实际计数

  }

上述逻辑需要1分钟以上来执行。

我尝试过但需要更多帮助的地方:

  • 需要尝试CTE(公共表达式),因为我还没有尝试过,但仍在尝试但失败。
  • 直接编写只需一个具有动态值的查询来获取所需输出结果的方法。
英文:

I have 3 models, Postgresql db -

##user.rb - :full_name
has_many :enquiries

##enquiry.rb - :actual_target_count, :actual_visit_count
##there are different types of enquiry too - business, formal, phone, email etc
belongs_to :user

##sales_projection.rb - :target_month, desired_target_count, desired_visit_count
belongs_to :user
belongs_to :enquiry

Problem Statement - If I have 500+ users who have provided a target for current month (july 2023) with different types of enquiries (60k entries), how can I get a projection of a particular month to show all the users desired vs actual targets.

Let me elaborate this in more detail below -

records in users table (500+ entries)

{id: 1, full_name: "mike"}
{id: 2, full_name: "jason"}
{id: 3, full_name: "remi"}
{id: 4, full_name: "sam"}

records in enquiries table (500+ entries), every enquiry can have multiple entries of users

mike has worked on 5 enquiries, mike has worked on 20 enquiries and so on....

{id: 1, user_id: 1, actual_target_coun:30, actual_visit_count: 40}
{id: 2, user_id: 1, actual_target_coun:10, actual_visit_count: 05}
{id: 3, user_id: 1, actual_target_coun:20, actual_visit_count: 10}

records in sales_projection table (50k+ entries), every sales_projection record every users projections (actual vs desired targets count) for selected month(here its july 2023).

{id: 1, user_id: 1, desired_target_coun:30, desired_visit_count: 40, target_month: "July 2023"}
{id: 2, user_id: 2, desired_target_coun:10, desired_visit_count: 05, target_month: "June 2023"}
{id: 3, user_id: 3, desired_target_coun:20, desired_visit_count: 10, target_month: "July 2023"}
{id: 4, user_id: 4, desired_target_coun:50, desired_visit_count: 15, target_month: "March 2023"}

Based on the above set of records, how can I show a list of all the users and their desired/actual count for a selected month?

this is what i want to be shown:

user   |   desired target  | actual target | desired visit | actual visit | target_month

This is not possible with ActiveRecord (which i have currently) as this will be very time consuming and increase the page load time.

Currently this is what I have:

##pseudo code :)
SalesProjection.where(target_month: "July 2023").order(target_month: :asc).group_by { |m| m.user.id }.map do |key, value|
{
 get user full name,
 get user desired count using joins
 get user actual count using joins

}

end

The above logic takes 1+ mins to load the execute.

What I have tried but need more help:

  • Need help to try CTE (common table expressions) as I havent tried it but still trying but failing.
  • Direct way to write just one query with dynamic values to fetch and get the results in the desired output.

答案1

得分: 1

你的查询策略较慢,因为你实例化了许多ActiveRecord对象。关键是只实例化感兴趣的对象和字段,并且只需实例化一次。以下是你可能如何构建你的查询:

class User < ApplicationRecord
  has_many :enquiries
  has_many :sales_projections # 注意:在原始帖子中缺失

  def self.report
    select =<<-SEL
      users.full_name,
      sales_projections.target_month,
      sales_projections.desired_target_count,
      enquiries.actual_target_count,
      sales_projections.desired_visit_count,
      enquiries.actual_visit_count
    SEL
    User.
      select(select).
      joins(:enquiries, :sales_projections).
      where("sales_projections.target_month = ?", "July 2023").
      where("extract(month from enquiries.created_at)::integer = 7").
      where("extract(year from enquiries.created_at)::integer = 2023").
      where("enquiries.is_visited = ? and enquiries.is_qualified = ?", true, true).
      map do |u|
        [u.full_name,
         u.target_month,
         u.desired_target_count,
         u.actual_target_count,
         u.desired_visit_count,
         u.actual_visit_count]
      end
  end
end

User.report # => [["Olene", "July 2023", 82, 48, 80, 0],
#    ["Kalyn", "July 2023", 10, 2, 26, 0],
#    ["Drew", "July 2023", 7, 36, 53, 26],
#    ["Terry", "July 2023", 99, 26, 44, 37],
#    ["Olene", "July 2023", 82, 36, 80, 2],
#    ["Sherlene", "July 2023", 71, 19, 79, 43],
#    ["Delmer", "July 2023", 5, 48, 43, 20],
#    ["Connie", "July 2023", 86, 1, 42, 34],
#    ["Candis", "July 2023", 86, 17, 13, 9],
#    ["Candis", "July 2023", 86, 11, 13, 21], ...等等。

这应该比你的查询快得多,我使用了50个用户,5000个销售预测和1000个询问,查询花费了50毫秒。我没有使用任何索引...你应该这样做以获得更好的性能。

与其查询enquiries.created_at是否在两个日期之间,我建议直接查询月份和年份值,性能会有所提升。

我很想知道在你的数据集中,这个表现有多快。

英文:

Your query strategy is slow b/c you instantiate many ActiveRecord objects. The key is to instantiate only the objects and fields of interest, and only do it once. So here is how you might structure your query

class User < ApplicationRecord
  has_many :enquiries
  has_many :sales_projections # note: missing from OP

  def self.report
    select =<<-SEL
      users.full_name,
      sales_projections.target_month,
      sales_projections.desired_target_count,
      enquiries.actual_target_count,
      sales_projections.desired_visit_count,
      enquiries.actual_visit_count
    SEL
    User.
      select(select).
      joins(:enquiries, :sales_projections).
      where("sales_projections.target_month = ?", "July 2023").
      where("extract(month from enquiries.created_at)::integer = 7").
      where("extract(year from enquiries.created_at)::integer = 2023").
      where("enquiries.is_visited = ? and enquiries.is_qualified = ?", true, true).
      map do |u|
        [u.full_name,
         u.target_month,
         u.desired_target_count,
         u.actual_target_count,
         u.desired_visit_count,
         u.actual_visit_count]
    end
  end
end

User.report # => [["Olene", "July 2023", 82, 48, 80, 0],
#    ["Kalyn", "July 2023", 10, 2, 26, 0],
#    ["Drew", "July 2023", 7, 36, 53, 26],
#    ["Terry", "July 2023", 99, 26, 44, 37],
#    ["Olene", "July 2023", 82, 36, 80, 2],
#    ["Sherlene", "July 2023", 71, 19, 79, 43],
#    ["Delmer", "July 2023", 5, 48, 43, 20],
#    ["Connie", "July 2023", 86, 1, 42, 34],
#    ["Candis", "July 2023", 86, 17, 13, 9],
#    ["Candis", "July 2023", 86, 11, 13, 21], ...etc.

This should be much much faster than your query, I used 50 users, 5000 sales_projections, and 1000 enquiries and the query took 50mS. I didn't use any indexes... you should do that to get even better performance.

Rather than querying the enquiries.created_at as between two dates, I suggest performance will be improved by querying the month and year values directly.

I'll be curious to know how fast this performs with your data set.

huangapple
  • 本文由 发表于 2023年7月3日 02:42:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76600310.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定