2023年2月16日 05:29:55go评论98阅读模式

英文:

Execute an SQL query depending on the parameters of a pandas dataframe

问题

我有一个名为final_data的pandas数据帧，看起来像这样

cust_id	start_date	end_date
10001	2022-01-01	2022-01-30
10002	2022-02-01	2022-02-30
10003	2022-01-01	2022-01-30
10004	2022-03-01	2022-03-30
10005	2022-02-01	2022-02-30

我在我的SQL数据库中有另一个名为penalties的表，看起来像这样

cust_id	level1_pen	level_2_pen	date
10001	1	4	2022-01-01
10001	1	1	2022-01-02
10001	0	1	2022-01-03
10002	1	1	2022-01-01
10002	5	0	2022-02-01
10002	4	0	2022-02-04
10003	1	6	2022-01-02

我希望final_data数据帧如下所示，其中根据pandas数据帧每一行的cust_id、start_date和end_date变量，汇总了SQL数据库中的penalties表中的数据

cust_id	start_date	end_date	total_penalties
10001	2022-01-01	2022-01-30	8
10002	2022-02-01	2022-02-30	9
10003	2022-01-01	2022-01-30	7

我该如何为每一行组合一个lambda函数，以便根据pandas数据帧的cust_id、start_date和end_date变量，汇总SQL查询中的数据？

英文:

I have a pandas data frame called final_data that looks like this

cust_id	start_date	end_date
10001	2022-01-01	2022-01-30
10002	2022-02-01	2022-02-30
10003	2022-01-01	2022-01-30
10004	2022-03-01	2022-03-30
10005	2022-02-01	2022-02-30

I have another table in my sql database called penalties that looks like this

cust_id	level1_pen	level_2_pen	date
10001	1	4	2022-01-01
10001	1	1	2022-01-02
10001	0	1	2022-01-03
10002	1	1	2022-01-01
10002	5	0	2022-02-01
10002	4	0	2022-02-04
10003	1	6	2022-01-02

I want the final_data frame to look like this where it aggregates the data from the penalties table in SQL database based on the cust_id, start_date and end_date

cust_id	start_date	end_date	total_penalties
10001	2022-01-01	2022-01-30	8
10002	2022-02-01	2022-02-30	9
10003	2022-01-01	2022-01-30	7

How do I combine a lambda function for each row where it aggregates the data from the SQL query based on the cust_id, start_date, and end_date variables from each row of the pandas dataframe

答案1

得分: 1

假设

df = 最终数据表

df2 = 罚款表

您可以使用以下查询获取所需的最终数据框：

SELECT
df.cust_id,
df.start_date,
df.end_date,
SUM(df2.level1_pen + df2.level_2_pen) as total_penalties
FROM
df
LEFT JOIN df2 ON df.cust_id = df2.cust_id
AND df2.date BETWEEN df.start_date AND df.end_date
GROUP BY
df.cust_id,
df.start_date,
df.end_date;

英文:

Suppose

df = final_data table

df2 = penalties table

you can get the final_data frame that you want using this query:

SELECT
  df.cust_id,
  df.start_date,
  df.end_date,
  SUM(df2.level1_pen + df2.level_2_pen) as total_penalties
FROM
  df
  LEFT JOIN df2 ON df.cust_id = df2.cust_id
    AND df2.date BETWEEN df.start_date AND df.end_date
GROUP BY
  df.cust_id,
  df.start_date,
  df.end_date;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

执行一个 SQL 查询，根据 pandas 数据帧的参数进行操作。

问题

答案1

SQL按姓氏对名字进行升序和降序排序。

处理多个托盘和生产批次的SQL递归

IllegalSelectQueryException和InvalidDataAccessApiUsageException

在lambda表达式中的返回类型，Reduce函数

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。