英文:
Execute an SQL query depending on the parameters of a pandas dataframe
问题
我有一个名为final_data的pandas数据帧,看起来像这样
cust_id | start_date | end_date |
---|---|---|
10001 | 2022-01-01 | 2022-01-30 |
10002 | 2022-02-01 | 2022-02-30 |
10003 | 2022-01-01 | 2022-01-30 |
10004 | 2022-03-01 | 2022-03-30 |
10005 | 2022-02-01 | 2022-02-30 |
我在我的SQL数据库中有另一个名为penalties的表,看起来像这样
cust_id | level1_pen | level_2_pen | date |
---|---|---|---|
10001 | 1 | 4 | 2022-01-01 |
10001 | 1 | 1 | 2022-01-02 |
10001 | 0 | 1 | 2022-01-03 |
10002 | 1 | 1 | 2022-01-01 |
10002 | 5 | 0 | 2022-02-01 |
10002 | 4 | 0 | 2022-02-04 |
10003 | 1 | 6 | 2022-01-02 |
我希望final_data数据帧如下所示,其中根据pandas数据帧每一行的cust_id、start_date和end_date变量,汇总了SQL数据库中的penalties表中的数据
cust_id | start_date | end_date | total_penalties |
---|---|---|---|
10001 | 2022-01-01 | 2022-01-30 | 8 |
10002 | 2022-02-01 | 2022-02-30 | 9 |
10003 | 2022-01-01 | 2022-01-30 | 7 |
我该如何为每一行组合一个lambda函数,以便根据pandas数据帧的cust_id、start_date和end_date变量,汇总SQL查询中的数据?
英文:
I have a pandas data frame called final_data that looks like this
cust_id | start_date | end_date |
---|---|---|
10001 | 2022-01-01 | 2022-01-30 |
10002 | 2022-02-01 | 2022-02-30 |
10003 | 2022-01-01 | 2022-01-30 |
10004 | 2022-03-01 | 2022-03-30 |
10005 | 2022-02-01 | 2022-02-30 |
I have another table in my sql database called penalties that looks like this
cust_id | level1_pen | level_2_pen | date |
---|---|---|---|
10001 | 1 | 4 | 2022-01-01 |
10001 | 1 | 1 | 2022-01-02 |
10001 | 0 | 1 | 2022-01-03 |
10002 | 1 | 1 | 2022-01-01 |
10002 | 5 | 0 | 2022-02-01 |
10002 | 4 | 0 | 2022-02-04 |
10003 | 1 | 6 | 2022-01-02 |
I want the final_data frame to look like this where it aggregates the data from the penalties table in SQL database based on the cust_id, start_date and end_date
cust_id | start_date | end_date | total_penalties |
---|---|---|---|
10001 | 2022-01-01 | 2022-01-30 | 8 |
10002 | 2022-02-01 | 2022-02-30 | 9 |
10003 | 2022-01-01 | 2022-01-30 | 7 |
How do I combine a lambda function for each row where it aggregates the data from the SQL query based on the cust_id, start_date, and end_date variables from each row of the pandas dataframe
答案1
得分: 1
假设
df = 最终数据表
df2 = 罚款表
您可以使用以下查询获取所需的最终数据框:
SELECT
df.cust_id,
df.start_date,
df.end_date,
SUM(df2.level1_pen + df2.level_2_pen) as total_penalties
FROM
df
LEFT JOIN df2 ON df.cust_id = df2.cust_id
AND df2.date BETWEEN df.start_date AND df.end_date
GROUP BY
df.cust_id,
df.start_date,
df.end_date;
英文:
Suppose
df = final_data table
df2 = penalties table
you can get the final_data frame that you want using this query:
SELECT
df.cust_id,
df.start_date,
df.end_date,
SUM(df2.level1_pen + df2.level_2_pen) as total_penalties
FROM
df
LEFT JOIN df2 ON df.cust_id = df2.cust_id
AND df2.date BETWEEN df.start_date AND df.end_date
GROUP BY
df.cust_id,
df.start_date,
df.end_date;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论