Pandas基于月份和年份比较数值。

huangapple go评论86阅读模式
英文:

Pandas comparing values based on month and year

问题

在我的当前问题陈述中,我想比较与特定月份对齐的两个不同列中的值,并返回一个作为标识符的值。我的数据框如下所示:

  1. Account year month value_1 value_2
  2. A 2021 Jan 9
  3. A 2021 Feb
  4. A 2021 Mar 7
  5. A 2021 Apr 8
  6. A 2021 May
  7. B 2021 Jan 2
  8. B 2021 Feb 10
  9. B 2021 Mar 5
  10. B 2021 Apr 7

现在,在上述情况下,对于帐户A,值9在value_1列中出现在比value_2早的月份,而在帐户B中,值2在value_2列中出现在值10之前的月份。基本上,我想要比较每个帐户的value_2列值是否在value_1列之前出现,并且这些列已经按月份和年份排序。

我想要做的是创建一个结果数据框,显示以下内容:

  1. account result
  2. A value_2value_1之前出现
  3. B value_1value_2之前出现

如何实现这一点?

英文:

In my current problem statement, I would like to compare values in two different columns which are aligned to a specific month and return a value that would act as a identifier. My dataframe looks like the following:

  1. Account year month value_1 value_2
  2. A 2021 Jan 9
  3. A 2021 Feb
  4. A 2021 Mar 7
  5. A 2021 Apr 8
  6. A 2021 May
  7. B 2021 Jan 2
  8. B 2021 Feb 10
  9. B 2021 Mar 5
  10. B 2021 Apr 7

Now in the above scenario, for account A, the value 9 appeared in a earlier month in value_2 column as compared to value_1 and in account B, the value 2 appeared before the value 10 in value_2 column. Essentially I want to compare if value_2 column values appeared before value_1 column for every account and these columns are sorted by month and year already.

What I want to do is to create a resultant dataframe that shows the following:

  1. account result
  2. A value_2 appeared before value_1
  3. B value_1 appeared before value_2

How can I achieve this?

答案1

得分: 1

以下是代码的翻译部分:

  1. 假设每个组至少有1个值且同一行永远不会有2个值... 这应该适用于您
  2. def check_order(group):
  3. value_1_idx = group['value_1'].dropna().index.min()
  4. value_2_idx = group['value_2'].dropna().index.min()
  5. if (value_1_idx.min() < value_2_idx) or ((value_2_idx is np.nan) & (value_1_idx != np.nan)):
  6. return 'value_1 在 value_2 之前出现'
  7. elif (value_2_idx < value_1_idx) or ((value_1_idx is np.nan) & (value_2_idx != np.nan)):
  8. return 'value_2 在 value_1 之前出现'
  9. result = df.groupby('Account').apply(check_order).reset_index(name='result')

请注意,翻译中的"出现"的含义可能根据上下文需要进行进一步调整。

英文:

Assuming you will only have at least 1 value per group and never 2 values in the same row... this should work for you.

  1. def check_order(group):
  2. value_1_idx = group[&#39;value_1&#39;].dropna().index.min()
  3. value_2_idx = group[&#39;value_2&#39;].dropna().index.min()
  4. if (value_1_idx.min() &lt; value_2_idx) or ((value_2_idx is np.nan) &amp; (value_1_idx != np.nan)):
  5. return &#39;value_1 appeared before value_2&#39;
  6. elif (value_2_idx &lt; value_1_idx) or ((value_1_idx is np.nan) &amp; (value_2_idx != np.nan)):
  7. return &#39;value_2 appeared before value_1&#39;
  8. result = df.groupby(&#39;Account&#39;).apply(check_order).reset_index(name=&#39;result&#39;)

huangapple
  • 本文由 发表于 2023年2月24日 01:05:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548064.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定