2023年4月4日 05:39:43go评论69阅读模式

英文:

Python or SQL solution- Creating effective date and expiration date table

问题

我有一张跟踪银行账户信息的表格。每个银行账户都可以随时添加或删除感兴趣的人（甚至在同一天）。这个人可以被添加一次并永不删除，或者被添加和删除一次或多次（甚至一天多次）。

示例：

日期	银行ID	账户ID	类型	个人ID
2/9/2022	0001	0004	添加	0015
2/10/2022	0004	0005	添加	0038
3/2/2022	0001	0004	删除	0015

如上所示，个人0015被添加为账户0004的感兴趣方，该账户位于银行0001。我希望将数据更改为以有效日期-到期日期的格式。

因此，我们会看到：

有效日期	到期日期	银行ID	账户ID	个人ID
2/9/2022	3/2/2022	0001	0004	0015
2/10/2022	null	0004	0005	0038

我的数据超过2.5亿行，所以我正在寻找一种高效的方法。SQL非常慢，所以我尝试使用Python。但我唯一能想到的方法是遍历每个唯一的银行/账户/个人实例，然后对于每个添加，找到大于添加日期的最近的删除（如果有的话）。

循环在pandas中从不是高效的方法，有人可以帮助找到更清晰和更高效的方法来收集这些数据吗？这也可以在SQL中完成，但这些数据永远不会成为新表；它用于分析目的。

英文:

I have a table that tracks bank account information. Each bank account can have someone added as a interested party and taken off at any time (even same day). The person can be added once and never removed or added and removed once or multiple times (even several times a day)

Example:

Date	BankId	AccountID	Type	PersonId
2/9/2022	0001	0004	Addition	0015
2/10/2022	0004	0005	Addition	0038
3/2/2022	0001	0004	Deletion	0015

As seen above person 0015 was added as an Interested Party to account 0004 which is housed at bank 0001. I want to get the data as more of a effective date- expiration date format.

So instead we would see

EffectiveDate	ExpirationDate	BankId	AccountID	PersonId
2/9/2022	3/2/2022	0001	0004	0015
2/10/2022	null	0004	0005	0038

My data is over 250 million rows, so I am looking for an efficient way to do this. SQL has been extremely slow, so I am trying python. But the only way I can think to do this is to loop through each unique instance of bank/account/person and for each addition find the closest deletion (if any) that is greater than the date of the addition.

Loops are never effective with pandas, so can anyone assist with a clearer more efficient way to gather this data? It could be in SQL too, but this data will never be a new table; it is being used for analytics purposes.

答案1

得分: 1

I'm supposing that the data is sorted by the Date. Then you can do:

df["tmp"] = df["Type"].eq("Addition")
df["tmp"] = df.groupby(["BankId", "AccountID", "PersonId"])["tmp"].cumsum()
out = (
    df.pivot(
        index=["BankId", "AccountID", "PersonId", "tmp"], columns="Type", values="Date"
    )
    .reset_index()
    .rename_axis(None, axis=1)
    .drop(columns="tmp")
)

print(out)

Prints:

   BankId  AccountID  PersonId   Addition  Deletion
0       1          4        15   2/9/2022  3/2/2022
1       4          5        38  2/10/2022       NaN

英文:

I'm supposing that the data is sorted by the Date. Then you can do:

df[&quot;tmp&quot;] = df[&quot;Type&quot;].eq(&quot;Addition&quot;)
df[&quot;tmp&quot;] = df.groupby([&quot;BankId&quot;, &quot;AccountID&quot;, &quot;PersonId&quot;])[&quot;tmp&quot;].cumsum()
out = (
    df.pivot(
        index=[&quot;BankId&quot;, &quot;AccountID&quot;, &quot;PersonId&quot;, &quot;tmp&quot;], columns=&quot;Type&quot;, values=&quot;Date&quot;
    )
    .reset_index()
    .rename_axis(None, axis=1)
    .drop(columns=&quot;tmp&quot;)
)

print(out)

Prints:

   BankId  AccountID  PersonId   Addition  Deletion
0       1          4        15   2/9/2022  3/2/2022
1       4          5        38  2/10/2022       NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python或SQL解决方案- 创建有效日期和到期日期表

问题

答案1

Flask应用正在正确运行功能，但未呈现模板。

在TKinter Treeview中的层次结构

将大写字母转换为它们对应的Unicode值？

使用颜色谱在散点图中显示信息。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论