如何在不同列的值匹配时比较特定列中的数据?

huangapple go评论78阅读模式
英文:

Python How to compare data in a certain column when their values in a different column match?

问题

# 我目前正在从SQL中提取数据,希望计算具有相同标识符的数据点之间的最大差异。

import pandas as pd

df1 = pd.read_csv('SQLdata.csv')
# 这个数据框的示例数据如下:

Year,Case
2015,1A
2016,1A
2017,1A
2015,2F
2018,2F
# 我希望编写一个脚本,为每个案例添加一个新列,该列表示最小(Year)和最大(Year)之间的差异。类似于这样:

Year,Case,YearDifference
2015,1A,2
2016,1A,2
2017,1A,2
2015,2F,3
2018,2F,3

我尝试编写了一个for循环,但在尝试为每个案例执行特定操作时遇到了问题。


<details>
<summary>英文:</summary>

I&#39;m currently taking data from SQL in hopes of calculating the biggest difference between data points with the same identifiers.


import pandas as pd

df1 = pd.read_csv('SQLdata.csv')

Sample data for this df looks like

Year,Case
2015,1A
2016,1A
2017,1A
2015,2F
2018,2F

I&#39;m hoping to write a script that will add a new column for the difference between the min(Year) and max(Year) for each case. Something like this:


Year,Case,YearDifference
2015,1A,2
2016,1A,2
2017,1A,2
2015,2F,3
2018,2F,3


I&#39;ve tried to write a for loop, but got stuck when trying to do it specific for each case.

</details>


# 答案1
**得分**: 0

你可以尝试使用`groupby`和`transform`来计算每个分组的最大日期减去最小日期:

```python
grp = df.groupby('Case')['Year']
df['Diff'] = grp.transform(max).sub(grp.transform(min))

你也可以尝试使用groupby.agg来获取每个分组的最小和最大年份,然后将差异映射回Case

df['Diff'] = df['Case'].map(df.groupby('Case')['Year'].agg([min, max]).diff(axis=1)['max'])

输出结果如下:

   Year Case  Diff
0  2015   1A     2
1  2016   1A     2
2  2017   1A     2
3  2015   2F     3
4  2018   2F     3
英文:

You can try using groupby and transform to subtract the max date from the min date for each group

grp = df.groupby(&#39;Case&#39;)[&#39;Year&#39;]
df[&#39;Diff&#39;] = grp.transform(max).sub(grp.transform(min))

You can also try to use groupby.agg to get the min and max year for each group and map the difference back to the Case

df[&#39;Diff&#39;] = df[&#39;Case&#39;].map(df.groupby(&#39;Case&#39;)[&#39;Year&#39;].agg([min, max])
                            .diff(axis=1)[&#39;max&#39;])

Out

   Year Case  Diff
0  2015   1A     2
1  2016   1A     2
2  2017   1A     2
3  2015   2F     3
4  2018   2F     3

huangapple
  • 本文由 发表于 2023年6月22日 00:13:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76525273.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定