执行一个 SQL 查询,根据 pandas 数据帧的参数进行操作。

huangapple go评论54阅读模式
英文:

Execute an SQL query depending on the parameters of a pandas dataframe

问题

我有一个名为final_data的pandas数据帧,看起来像这样

cust_id start_date end_date
10001 2022-01-01 2022-01-30
10002 2022-02-01 2022-02-30
10003 2022-01-01 2022-01-30
10004 2022-03-01 2022-03-30
10005 2022-02-01 2022-02-30

我在我的SQL数据库中有另一个名为penalties的表,看起来像这样

cust_id level1_pen level_2_pen date
10001 1 4 2022-01-01
10001 1 1 2022-01-02
10001 0 1 2022-01-03
10002 1 1 2022-01-01
10002 5 0 2022-02-01
10002 4 0 2022-02-04
10003 1 6 2022-01-02

我希望final_data数据帧如下所示,其中根据pandas数据帧每一行的cust_id、start_date和end_date变量,汇总了SQL数据库中的penalties表中的数据

cust_id start_date end_date total_penalties
10001 2022-01-01 2022-01-30 8
10002 2022-02-01 2022-02-30 9
10003 2022-01-01 2022-01-30 7

我该如何为每一行组合一个lambda函数,以便根据pandas数据帧的cust_id、start_date和end_date变量,汇总SQL查询中的数据?

英文:

I have a pandas data frame called final_data that looks like this

cust_id start_date end_date
10001 2022-01-01 2022-01-30
10002 2022-02-01 2022-02-30
10003 2022-01-01 2022-01-30
10004 2022-03-01 2022-03-30
10005 2022-02-01 2022-02-30

I have another table in my sql database called penalties that looks like this

cust_id level1_pen level_2_pen date
10001 1 4 2022-01-01
10001 1 1 2022-01-02
10001 0 1 2022-01-03
10002 1 1 2022-01-01
10002 5 0 2022-02-01
10002 4 0 2022-02-04
10003 1 6 2022-01-02

I want the final_data frame to look like this where it aggregates the data from the penalties table in SQL database based on the cust_id, start_date and end_date

cust_id start_date end_date total_penalties
10001 2022-01-01 2022-01-30 8
10002 2022-02-01 2022-02-30 9
10003 2022-01-01 2022-01-30 7

How do I combine a lambda function for each row where it aggregates the data from the SQL query based on the cust_id, start_date, and end_date variables from each row of the pandas dataframe

答案1

得分: 1

假设

df = 最终数据表

df2 = 罚款表

您可以使用以下查询获取所需的最终数据框:

SELECT
df.cust_id,
df.start_date,
df.end_date,
SUM(df2.level1_pen + df2.level_2_pen) as total_penalties
FROM
df
LEFT JOIN df2 ON df.cust_id = df2.cust_id
AND df2.date BETWEEN df.start_date AND df.end_date
GROUP BY
df.cust_id,
df.start_date,
df.end_date;

英文:

Suppose

df = final_data table

df2 = penalties table

you can get the final_data frame that you want using this query:

SELECT
  df.cust_id,
  df.start_date,
  df.end_date,
  SUM(df2.level1_pen + df2.level_2_pen) as total_penalties
FROM
  df
  LEFT JOIN df2 ON df.cust_id = df2.cust_id
    AND df2.date BETWEEN df.start_date AND df.end_date
GROUP BY
  df.cust_id,
  df.start_date,
  df.end_date;

huangapple
  • 本文由 发表于 2023年2月16日 05:29:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465614.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定