英文:
Execute an SQL query depending on the parameters of a pandas dataframe
问题
我有一个名为final_data的pandas数据帧,看起来像这样
| cust_id | start_date | end_date |
|---|---|---|
| 10001 | 2022-01-01 | 2022-01-30 |
| 10002 | 2022-02-01 | 2022-02-30 |
| 10003 | 2022-01-01 | 2022-01-30 |
| 10004 | 2022-03-01 | 2022-03-30 |
| 10005 | 2022-02-01 | 2022-02-30 |
我在我的SQL数据库中有另一个名为penalties的表,看起来像这样
| cust_id | level1_pen | level_2_pen | date |
|---|---|---|---|
| 10001 | 1 | 4 | 2022-01-01 |
| 10001 | 1 | 1 | 2022-01-02 |
| 10001 | 0 | 1 | 2022-01-03 |
| 10002 | 1 | 1 | 2022-01-01 |
| 10002 | 5 | 0 | 2022-02-01 |
| 10002 | 4 | 0 | 2022-02-04 |
| 10003 | 1 | 6 | 2022-01-02 |
我希望final_data数据帧如下所示,其中根据pandas数据帧每一行的cust_id、start_date和end_date变量,汇总了SQL数据库中的penalties表中的数据
| cust_id | start_date | end_date | total_penalties |
|---|---|---|---|
| 10001 | 2022-01-01 | 2022-01-30 | 8 |
| 10002 | 2022-02-01 | 2022-02-30 | 9 |
| 10003 | 2022-01-01 | 2022-01-30 | 7 |
我该如何为每一行组合一个lambda函数,以便根据pandas数据帧的cust_id、start_date和end_date变量,汇总SQL查询中的数据?
英文:
I have a pandas data frame called final_data that looks like this
| cust_id | start_date | end_date |
|---|---|---|
| 10001 | 2022-01-01 | 2022-01-30 |
| 10002 | 2022-02-01 | 2022-02-30 |
| 10003 | 2022-01-01 | 2022-01-30 |
| 10004 | 2022-03-01 | 2022-03-30 |
| 10005 | 2022-02-01 | 2022-02-30 |
I have another table in my sql database called penalties that looks like this
| cust_id | level1_pen | level_2_pen | date |
|---|---|---|---|
| 10001 | 1 | 4 | 2022-01-01 |
| 10001 | 1 | 1 | 2022-01-02 |
| 10001 | 0 | 1 | 2022-01-03 |
| 10002 | 1 | 1 | 2022-01-01 |
| 10002 | 5 | 0 | 2022-02-01 |
| 10002 | 4 | 0 | 2022-02-04 |
| 10003 | 1 | 6 | 2022-01-02 |
I want the final_data frame to look like this where it aggregates the data from the penalties table in SQL database based on the cust_id, start_date and end_date
| cust_id | start_date | end_date | total_penalties |
|---|---|---|---|
| 10001 | 2022-01-01 | 2022-01-30 | 8 |
| 10002 | 2022-02-01 | 2022-02-30 | 9 |
| 10003 | 2022-01-01 | 2022-01-30 | 7 |
How do I combine a lambda function for each row where it aggregates the data from the SQL query based on the cust_id, start_date, and end_date variables from each row of the pandas dataframe
答案1
得分: 1
假设
df = 最终数据表
df2 = 罚款表
您可以使用以下查询获取所需的最终数据框:
SELECT
df.cust_id,
df.start_date,
df.end_date,
SUM(df2.level1_pen + df2.level_2_pen) as total_penalties
FROM
df
LEFT JOIN df2 ON df.cust_id = df2.cust_id
AND df2.date BETWEEN df.start_date AND df.end_date
GROUP BY
df.cust_id,
df.start_date,
df.end_date;
英文:
Suppose
df = final_data table
df2 = penalties table
you can get the final_data frame that you want using this query:
SELECT
df.cust_id,
df.start_date,
df.end_date,
SUM(df2.level1_pen + df2.level_2_pen) as total_penalties
FROM
df
LEFT JOIN df2 ON df.cust_id = df2.cust_id
AND df2.date BETWEEN df.start_date AND df.end_date
GROUP BY
df.cust_id,
df.start_date,
df.end_date;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论