2023年6月22日 00:13:56go评论121阅读模式

英文:

Python How to compare data in a certain column when their values in a different column match?

问题

# 我目前正在从SQL中提取数据，希望计算具有相同标识符的数据点之间的最大差异。
import pandas as pd
df1 = pd.read_csv('SQLdata.csv')

# 这个数据框的示例数据如下：
Year,Case
2015,1A
2016,1A
2017,1A
2015,2F
2018,2F

# 我希望编写一个脚本，为每个案例添加一个新列，该列表示最小(Year)和最大(Year)之间的差异。类似于这样：
Year,Case,YearDifference
2015,1A,2
2016,1A,2
2017,1A,2
2015,2F,3
2018,2F,3

我尝试编写了一个for循环，但在尝试为每个案例执行特定操作时遇到了问题。


<details>
<summary>英文:</summary>
I&#39;m currently taking data from SQL in hopes of calculating the biggest difference between data points with the same identifiers.

import pandas as pd

df1 = pd.read_csv('SQLdata.csv')

Sample data for this df looks like

Year,Case
2015,1A
2016,1A
2017,1A
2015,2F
2018,2F

I&#39;m hoping to write a script that will add a new column for the difference between the min(Year) and max(Year) for each case. Something like this:

Year,Case,YearDifference
2015,1A,2
2016,1A,2
2017,1A,2
2015,2F,3
2018,2F,3


I&#39;ve tried to write a for loop, but got stuck when trying to do it specific for each case.
</details>
# 答案1
**得分**: 0
你可以尝试使用`groupby`和`transform`来计算每个分组的最大日期减去最小日期：
```python
grp = df.groupby('Case')['Year']
df['Diff'] = grp.transform(max).sub(grp.transform(min))

你也可以尝试使用groupby.agg来获取每个分组的最小和最大年份，然后将差异映射回Case：

df['Diff'] = df['Case'].map(df.groupby('Case')['Year'].agg([min, max]).diff(axis=1)['max'])

输出结果如下：

   Year Case  Diff
0  2015   1A     2
1  2016   1A     2
2  2017   1A     2
3  2015   2F     3
4  2018   2F     3

英文:

You can try using groupby and transform to subtract the max date from the min date for each group

grp = df.groupby(&#39;Case&#39;)[&#39;Year&#39;]
df[&#39;Diff&#39;] = grp.transform(max).sub(grp.transform(min))

You can also try to use groupby.agg to get the min and max year for each group and map the difference back to the Case

df[&#39;Diff&#39;] = df[&#39;Case&#39;].map(df.groupby(&#39;Case&#39;)[&#39;Year&#39;].agg([min, max])
                            .diff(axis=1)[&#39;max&#39;])

Out

   Year Case  Diff
0  2015   1A     2
1  2016   1A     2
2  2017   1A     2
3  2015   2F     3
4  2018   2F     3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在不同列的值匹配时比较特定列中的数据？

问题

我尝试编写了一个for循环，但在尝试为每个案例执行特定操作时遇到了问题。

Create subplot, by overlapping two dataframes of different shapes and column names, for every group/id,

命令未在 GNOME 终端中找到。

Excel VBA在循环不到300行后崩溃。

如何在Python Azure函数中使用特殊字符？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。