2023年7月17日 19:56:02go评论92阅读模式

英文:

Find changed entries in 2 Data frames

问题

上个月的数据表：

| 用户名     | 电子邮件               | 类型   | 许可证数量 |
| ---      | ---                | --- | ---       |
| Joe Blue | joe@codeschool.co.uk | 管理员 | 1         |
| Rachel Green | racehel@codeschool.co.uk | 用户 | 2         |
| Shirly Brown | shirley@codeschool.co.uk | 管理员 | 1         |
| Jack Black | Jack@codeschool.co.uk | 管理员 | 1         |
| Cheryl Stone | Cherylt@codeschool.co.uk | 管理员 | 1         |

本月的数据表：

| 用户名     | 电子邮件               | 类型   | 许可证数量 |
| ---      | ---                | --- | ---       |
| Joe Blue | joe@codeschool.co.uk | 管理员 | 1         |
| Rachel Green | racehel@codeschool.co.uk | 用户 | 2         |
| Richard Red | Richard@codeschool.co.uk | 管理员 | 1         |
| Jack Black | Jack@codeschool.co.uk | 管理员 | 1         |

期望的输出：
DF_Added（已添加的用户）：

| 用户名         | 电子邮件               | 类型   | 许可证数量 |
| ---          | ---                | --- | ---       |
| Richard Red | Richard@codeschool.co.uk | 管理员 | 1         |

DF_Removed（已移除的用户）：

| 用户名            | 电子邮件               | 类型   | 许可证数量 |
| ---             | ---                | --- | ---       |
| Cheryl Stone | Cherylt@codeschool.co.uk | 管理员 | 1         |

您提供的代码中，您尝试使用Python的Pandas库来执行这些任务，但似乎有一些问题。如果您需要关于代码的帮助，可以提出具体问题，我将尽力协助您。

英文:

I have 2 Excel sheets generated monthly containing user data from an app we pay licenses for. I need to find a list of users who were added this month and a list of users removed from last month's spreadsheet. I need to compare the Email column as this is the only unique column in my data

Last months DF

| UserName | Email | Type | Number Of Licenses |
| --- | --- | --- | --- |
| Joe Blue | joe@codeschool.co.uk | Admin | 1 |
| Rachel Green | racehel@codeschool.co.uk | User | 2 |
| Shirly Brown | shirley@codeschool.co.uk | Adimin | 1 |
| Jack Black | Jack@codeschool.co.uk | Adimin | 1 |
| Cheryl Stone | Cherylt@codeschool.co.uk | Adimin | 1 |

This month's DF

| UserName | Email | Type | Number Of Licenses |
| --- | --- | --- | --- |
| Joe Blue | joe@codeschool.co.uk | Admin | 1 |
| Rachel Green | racehel@codeschool.co.uk | User | 2 |
| Richard Red | Richard@codeschool.co.uk | Adimin | 1 |
| Jack Black | Jack@codeschool.co.uk | Adimin | 1 |

Desired Outputs
DF_Added

| UserName | Email | Type | Number Of Licenses |
| --- | --- | --- | --- |
| Richard Red | Richard@codeschool.co.uk | Adimin | 1 |

DF_Removed

| UserName | Email | Type | Number Of Licenses |
| --- | --- | --- | --- |
| Cheryl Stone | Cherylt@codeschool.co.uk | Adimin | 1 |

I tried several ways but can't get the desired output.

My current not working code

```
LastMonth = Path.cwd() / &quot;./tempwork/UserReport_June.xlsx&quot;
ThisMonth = Path.cwd() / &quot;./tempwork/UserReport_July.xlsx&quot;
df_LastMonth = pd.read_excel(LastMonth)
df_ThisMonth = pd.read_excel(ThisMonth)
df_changed  = pd.merge(df_LastMonth, df_ThisMonth, on=&quot;Email&quot;, 
how=&#39;outer&#39;, indicator=&#39;Exist&#39;)
df_changed  = df_changed.loc[df_changed[&#39;Exist&#39;] != &#39;both&#39;]
df_changed.to_excel(&quot;./tempwork/changed.xlsx&quot;)
```

答案1

得分: 1

可以使用.merge并带有indicator=参数。然后使用left_only/right_only指示器来过滤已添加/已删除的行。例如：

x = pd.merge(df_last, df_current, on='Email', how='outer', indicator=True)
# 已添加
added = x.loc[x['_merge'] == 'right_only', ['Email', 'UserName_y', 'Type_y', 'Number Of Licenses_y']]
added.columns = [c.split('_')[0] for c in added.columns]
# 已删除
removed = x.loc[x['_merge'] == 'left_only', ['Email', 'UserName_x', 'Type_x', 'Number Of Licenses_x']]
removed.columns = [c.split('_')[0] for c in removed.columns]
print(added)
print(removed)

打印结果：

                      Email     UserName    Type  Number Of Licenses
5  Richard@codeschool.co.uk  Richard Red  Adimin                 1.0
                      Email      UserName    Type  Number Of Licenses
2  shirley@codeschool.co.uk  Shirly Brown  Adimin                 1.0
4  Cherylt@codeschool.co.uk  Cheryl Stone  Adimin                 1.0

英文:

You can use .merge with indicator= parameter. Then use left_only/right_only indicator to filter rows that were added/removed. E.g.:

x = pd.merge(df_last, df_current, on=&#39;Email&#39;, how=&#39;outer&#39;, indicator=True)
# added
added = x.loc[x[&#39;_merge&#39;] == &#39;right_only&#39;, [&#39;Email&#39;, &#39;UserName_y&#39;, &#39;Type_y&#39;, &#39;Number Of Licenses_y&#39;]]
added.columns = [c.split(&#39;_&#39;)[0] for c in added.columns]
# removed
removed = x.loc[x[&#39;_merge&#39;] == &#39;left_only&#39;, [&#39;Email&#39;, &#39;UserName_x&#39;, &#39;Type_x&#39;, &#39;Number Of Licenses_x&#39;]]
removed.columns = [c.split(&#39;_&#39;)[0] for c in removed.columns]
print(added)
print(removed)

Prints:

                      Email     UserName    Type  Number Of Licenses
5  Richard@codeschool.co.uk  Richard Red  Adimin                 1.0
                      Email      UserName    Type  Number Of Licenses
2  shirley@codeschool.co.uk  Shirly Brown  Adimin                 1.0
4  Cherylt@codeschool.co.uk  Cheryl Stone  Adimin                 1.0

答案2

得分: 0

如果我正确理解了你的问题，你的问题意味着你需要第一个数据框存在但第二个数据框中不存在的行（这些人已取消订阅），以及第二个数据框存在但第一个数据框中不存在的行（这些人是新订阅的），为了执行这个任务，我们可以这样做：

# 查找第一个数据框中不存在于第二个数据框中的行（已取消订阅的用户）
un_subscribed_users = df1.merge(df2, on=list(df1.columns), how='left', indicator=True)
un_subscribed_users = (un_subscribed_users[un_subscribed_users['_merge'] == 'left_only']).drop(columns=["_merge"])
print(un_subscribed_users)
# 查找第二个数据框中不存在于第一个数据框中的行（新订阅的用户）
subscribed_users = df1.merge(df2, on=list(df1.columns), how='right', indicator=True)
subscribed_users = (subscribed_users[subscribed_users['_merge'] == 'right_only']).drop(columns=["_merge"])
print(subscribed_users)

只需微调/更改 on 参数以仅比较你希望用来比较记录的字段。

英文:

If I understood your question correctly, your question means you need the rows that exist in first dataframe but not in the second (those are whom unsubscribed), and also the rows that exist in the second dataframe but not in the first (those are whom newly subscribed), so to do this task we can do it as:

# Find rows in df1 that do not exist in df2
un_subscribed_users = df1.merge(df2, on=list(df1.columns), how=&#39;left&#39;, indicator=True)
un_subscribed_users = (un_subscribed_users[un_subscribed_users[&#39;_merge&#39;] == &#39;left_only&#39;]).drop(columns=[&quot;_merge&quot;])
print(un_subscribed_users)
# Find rows in df2 that do not exist in df1
subscribed_users = df1.merge(df2, on=list(df1.columns), how=&#39;right&#39;, indicator=True)
subscribed_users = (subscribed_users[subscribed_users[&#39;_merge&#39;] == &#39;right_only&#39;]).drop(columns=[&quot;_merge&quot;])
print(subscribed_users)

just tweak / change on parameter to only the field you wish to compare records with.

答案3

得分: 0

根据您的需求，以下是已翻译的代码部分：

one_df = pd.read_csv("one.csv", sep="|")
two_df = pd.read_csv("two.csv", sep="|")
total = pd.merge(one_df, two_df, how="outer", on="Email")
added_df = total[~total["Email"].isin(one_df["Email"])]
removed_df = total[~total["Email"].isin(two_df["Email"])]
print(added_df)
print(removed_df)

根据添加和删除的列表，您可以找出是否有任何更改。

英文:

As I understood you need added or removed susbscribers from the list.

So here I fist create the list with total subscribers from last month and this month and then kind of subtracted from the total to get the new added of removed. Filtered with isin.

one_df = pd.read_csv(&quot;one.csv&quot;, sep=&quot;|&quot;)
two_df = pd.read_csv(&quot;two.csv&quot;, sep=&quot;|&quot;)
total = pd.merge(one_df, two_df, how=&quot;outer&quot;, on=&quot;Email&quot;)
added_df = total[~total[&quot;Email&quot;].isin(one_df[&quot;Email&quot;])]
removed_df = total[~total[&quot;Email&quot;].isin(two_df[&quot;Email&quot;])]
print(added_df)
print(removed_df)

Based on added and removed list you can find out if anything is changed or not.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

寻找两个数据框中的变更条目

问题

答案1

答案2

答案3

根据子数据框更新数据框

通过与另一个DataFrame进行比较来筛选DataFrame

How to extract sentences from a pandas dataframe which are given in a column as word-per-row

Python设计问题 – 带有不同签名的具体类

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。