获取项目首次和最后出现时的值

huangapple go评论110阅读模式
英文:

Get value of item when it first and last appears

问题

我可以帮你完成这个任务。要获取2和4,你可以使用以下代码:

  1. import pandas as pd
  2. # 将日期列转换为日期时间格式
  3. df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
  4. # 找到每个项目的第一次出现和最后一次出现的索引
  5. first_appear_idx = df.groupby('Item')['Date'].idxmin()
  6. last_appear_idx = df.groupby('Item')['Date'].idxmax()
  7. # 根据索引获取对应的Count值
  8. count_at_earliest_date = df.loc[first_appear_idx]['Count'].values
  9. count_at_latest_date = df.loc[last_appear_idx]['Count'].values
  10. # 创建新的DataFrame
  11. df2['Count_at_earliest_date'] = count_at_earliest_date
  12. df2['Count_at_latest_date'] = count_at_latest_date
  13. # 重新设置索引
  14. df2.reset_index(inplace=True)

这段代码会给你想要的结果,包括2和4。希望对你有所帮助!

英文:

I have a dataframe like

  1. Item Date Count
  2. Item_1 01/06/2023 25
  3. Item_1 02/06/2023 14
  4. Item_1 03/06/2023 5
  5. Item_1 04/06/2023 7
  6. Item_2 03/06/2023 41
  7. Item_2 04/06/2023 49
  8. Item_2 05/06/2023 2
  9. Item_2 06/06/2023 38

and would like to aggregate to a new df with a line per item to show

  1. 1 - when it first appeared
  2. 2 - the value of "Count" when it first appeared
  3. 3 - when it last appeared
  4. 4 - the value of "Count" when it last appeared
  5. 5 - the minimum value of "Count" per item
  6. 6 - the maximum value of "Count" per item

So it should look like

  1. Item Earliest_date Count_at_earliest_date Latest_date Count_at_latest_date Min_of_count Max_of_count
  2. Item_1 01/06/2023 25 04/06/2023 25 5 25
  3. Item_2 03/06/2023 41 06/06/2023 41 2 49

I can get 1, 3, 5, 6 with

  1. df2 = df.groupby('Item').agg(earliest_date =('Date', 'min'), latest_date =('Date', 'max'), min_count = ('Count', 'min'), max_count = ('Count', 'max'))
  2. df2.reset_index(inplace=True)

But I can't figure how to get 2 and 4.

Please help!

答案1

得分: 1

  1. 让我们首先创建一个数据框df):
  2. import pandas as pd
  3. # 让我们首先创建一个数据框
  4. data = {
  5. '项目': ['项目_1', '项目_1', '项目_1', '项目_1', '项目_2', '项目_2', '项目_2', '项目_2'],
  6. '日期': ['01/06/2023', '02/06/2023', '03/06/2023', '04/06/2023', '03/06/2023', '04/06/2023', '05/06/2023', '06/06/2023'],
  7. '数量': [25, 14, 5, 7, 41, 49, 2, 38]
  8. }
  9. df = pd.DataFrame(data)
  10. 然后应用主要逻辑
  11. # 首先更改日期格式
  12. df['日期'] = pd.to_datetime(df['日期'], format='%d/%m/%Y')
  13. # 根据日期排序
  14. df = df.sort_values('日期')
  15. # 根据项目分组
  16. df_first = df.groupby('项目').first().reset_index()
  17. df_last = df.groupby('项目').last().reset_index()
  18. # 重命名列
  19. df_first = df_first.rename(columns={'日期': '最早日期', '数量': '最早日期的数量'})
  20. df_last = df_last.rename(columns={'日期': '最新日期', '数量': '最新日期的数量'})
  21. # 获取最小和最大值
  22. df_min_max = df.groupby('项目').agg(数量最小值=('数量', 'min'), 数量最大值=('数量', 'max')).reset_index()
  23. # 最后将它们合并
  24. df_result = pd.merge(df_first, df_last, on='项目')
  25. df_result = pd.merge(df_result, df_min_max, on='项目')
  26. df_result
英文:

Let's first create a df:
import pandas as pd

  1. # Let's first create a df
  2. data = {
  3. 'Item': ['Item_1', 'Item_1', 'Item_1', 'Item_1', 'Item_2', 'Item_2', 'Item_2', 'Item_2'],
  4. 'Date': ['01/06/2023', '02/06/2023', '03/06/2023', '04/06/2023', '03/06/2023', '04/06/2023', '05/06/2023', '06/06/2023'],
  5. 'Count': [25, 14, 5, 7, 41, 49, 2, 38]
  6. }
  7. df = pd.DataFrame(data)

Then applying the main logic

  1. # change date format first
  2. df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
  3. # sort based on Datew
  4. df = df.sort_values('Date')
  5. # group based on Items
  6. df_first = df.groupby('Item').first().reset_index()
  7. df_last = df.groupby('Item').last().reset_index()
  8. # Rename columns
  9. df_first = df_first.rename(columns={'Date': 'Earliest_date', 'Count': 'Count_at_earliest_date'})
  10. df_last = df_last.rename(columns={'Date': 'Latest_date', 'Count': 'Count_at_latest_date'})
  11. # get min and mac
  12. df_min_max = df.groupby('Item').agg(Min_of_count=('Count', 'min'), Max_of_count=('Count', 'max')).reset_index()
  13. # and merge them at last
  14. df_result = pd.merge(df_first, df_last, on='Item')
  15. df_result = pd.merge(df_result, df_min_max, on='Item')
  16. df_result

答案2

得分: 1

请按日期对数据框进行排序,然后使用 firstlast 获取 -2- 和 -4-:

  1. df2 = (df.sort_values('Date').groupby('Item', as_index=False)
  2. .agg(earliest_date=('Date', 'min'),
  3. count_at_earliest_date=('Count', 'first'),
  4. latest_date=('Date', 'max'),
  5. count_at_latest_date=('Count', 'last'),
  6. min_count=('Count', 'min'),
  7. max_count=('Count', 'max')))

输出:

  1. >>> df2
  2. Item earliest_date count_at_earliest_date latest_date count_at_latest_date min_count max_count
  3. 0 Item_1 01/06/2023 25 04/06/2023 7 5 25
  4. 1 Item_2 03/06/2023 41 06/06/2023 38 2 49
英文:

Sort your dataframe by dates first and use first and last to get -2- and -4-:

  1. df2 = (df.sort_values('Date').groupby('Item', as_index=False)
  2. .agg(earliest_date =('Date', 'min'),
  3. count_at_earliest_date=('Count', 'first'),
  4. latest_date =('Date', 'max'),
  5. count_at_latest_date=('Count', 'last'),
  6. min_count = ('Count', 'min'),
  7. max_count = ('Count', 'max')))

Output:

  1. >>> df2
  2. Item earliest_date count_at_earliest_date latest_date count_at_latest_date min_count max_count
  3. 0 Item_1 01/06/2023 25 04/06/2023 7 5 25
  4. 1 Item_2 03/06/2023 41 06/06/2023 38 2 49

huangapple
  • 本文由 发表于 2023年6月29日 23:49:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76582687.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定