如何在不同列的值匹配时比较特定列中的数据?

huangapple go评论121阅读模式
英文:

Python How to compare data in a certain column when their values in a different column match?

问题

  1. # 我目前正在从SQL中提取数据,希望计算具有相同标识符的数据点之间的最大差异。
  2. import pandas as pd
  3. df1 = pd.read_csv('SQLdata.csv')
  1. # 这个数据框的示例数据如下:
  2. Year,Case
  3. 2015,1A
  4. 2016,1A
  5. 2017,1A
  6. 2015,2F
  7. 2018,2F
  1. # 我希望编写一个脚本,为每个案例添加一个新列,该列表示最小(Year)和最大(Year)之间的差异。类似于这样:
  2. Year,Case,YearDifference
  3. 2015,1A,2
  4. 2016,1A,2
  5. 2017,1A,2
  6. 2015,2F,3
  7. 2018,2F,3

我尝试编写了一个for循环,但在尝试为每个案例执行特定操作时遇到了问题。

  1. <details>
  2. <summary>英文:</summary>
  3. I&#39;m currently taking data from SQL in hopes of calculating the biggest difference between data points with the same identifiers.

import pandas as pd

df1 = pd.read_csv('SQLdata.csv')

  1. Sample data for this df looks like

Year,Case
2015,1A
2016,1A
2017,1A
2015,2F
2018,2F

  1. I&#39;m hoping to write a script that will add a new column for the difference between the min(Year) and max(Year) for each case. Something like this:

Year,Case,YearDifference
2015,1A,2
2016,1A,2
2017,1A,2
2015,2F,3
2018,2F,3

  1. I&#39;ve tried to write a for loop, but got stuck when trying to do it specific for each case.
  2. </details>
  3. # 答案1
  4. **得分**: 0
  5. 你可以尝试使用`groupby``transform`来计算每个分组的最大日期减去最小日期:
  6. ```python
  7. grp = df.groupby('Case')['Year']
  8. df['Diff'] = grp.transform(max).sub(grp.transform(min))

你也可以尝试使用groupby.agg来获取每个分组的最小和最大年份,然后将差异映射回Case

  1. df['Diff'] = df['Case'].map(df.groupby('Case')['Year'].agg([min, max]).diff(axis=1)['max'])

输出结果如下:

  1. Year Case Diff
  2. 0 2015 1A 2
  3. 1 2016 1A 2
  4. 2 2017 1A 2
  5. 3 2015 2F 3
  6. 4 2018 2F 3
英文:

You can try using groupby and transform to subtract the max date from the min date for each group

  1. grp = df.groupby(&#39;Case&#39;)[&#39;Year&#39;]
  2. df[&#39;Diff&#39;] = grp.transform(max).sub(grp.transform(min))

You can also try to use groupby.agg to get the min and max year for each group and map the difference back to the Case

  1. df[&#39;Diff&#39;] = df[&#39;Case&#39;].map(df.groupby(&#39;Case&#39;)[&#39;Year&#39;].agg([min, max])
  2. .diff(axis=1)[&#39;max&#39;])

Out

  1. Year Case Diff
  2. 0 2015 1A 2
  3. 1 2016 1A 2
  4. 2 2017 1A 2
  5. 3 2015 2F 3
  6. 4 2018 2F 3

huangapple
  • 本文由 发表于 2023年6月22日 00:13:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76525273.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定