英文:
Python How to compare data in a certain column when their values in a different column match?
问题
# 我目前正在从SQL中提取数据,希望计算具有相同标识符的数据点之间的最大差异。
import pandas as pd
df1 = pd.read_csv('SQLdata.csv')
# 这个数据框的示例数据如下:
Year,Case
2015,1A
2016,1A
2017,1A
2015,2F
2018,2F
# 我希望编写一个脚本,为每个案例添加一个新列,该列表示最小(Year)和最大(Year)之间的差异。类似于这样:
Year,Case,YearDifference
2015,1A,2
2016,1A,2
2017,1A,2
2015,2F,3
2018,2F,3
我尝试编写了一个for循环,但在尝试为每个案例执行特定操作时遇到了问题。
<details>
<summary>英文:</summary>
I'm currently taking data from SQL in hopes of calculating the biggest difference between data points with the same identifiers.
import pandas as pd
df1 = pd.read_csv('SQLdata.csv')
Sample data for this df looks like
Year,Case
2015,1A
2016,1A
2017,1A
2015,2F
2018,2F
I'm hoping to write a script that will add a new column for the difference between the min(Year) and max(Year) for each case. Something like this:
Year,Case,YearDifference
2015,1A,2
2016,1A,2
2017,1A,2
2015,2F,3
2018,2F,3
I've tried to write a for loop, but got stuck when trying to do it specific for each case.
</details>
# 答案1
**得分**: 0
你可以尝试使用`groupby`和`transform`来计算每个分组的最大日期减去最小日期:
```python
grp = df.groupby('Case')['Year']
df['Diff'] = grp.transform(max).sub(grp.transform(min))
你也可以尝试使用groupby.agg
来获取每个分组的最小和最大年份,然后将差异映射回Case
:
df['Diff'] = df['Case'].map(df.groupby('Case')['Year'].agg([min, max]).diff(axis=1)['max'])
输出结果如下:
Year Case Diff
0 2015 1A 2
1 2016 1A 2
2 2017 1A 2
3 2015 2F 3
4 2018 2F 3
英文:
You can try using groupby
and transform
to subtract the max date from the min date for each group
grp = df.groupby('Case')['Year']
df['Diff'] = grp.transform(max).sub(grp.transform(min))
You can also try to use groupby.agg
to get the min and max year for each group and map the difference back to the Case
df['Diff'] = df['Case'].map(df.groupby('Case')['Year'].agg([min, max])
.diff(axis=1)['max'])
Out
Year Case Diff
0 2015 1A 2
1 2016 1A 2
2 2017 1A 2
3 2015 2F 3
4 2018 2F 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论