获取项目首次和最后出现时的值

huangapple go评论62阅读模式
英文:

Get value of item when it first and last appears

问题

我可以帮你完成这个任务。要获取2和4,你可以使用以下代码:

import pandas as pd

# 将日期列转换为日期时间格式
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

# 找到每个项目的第一次出现和最后一次出现的索引
first_appear_idx = df.groupby('Item')['Date'].idxmin()
last_appear_idx = df.groupby('Item')['Date'].idxmax()

# 根据索引获取对应的Count值
count_at_earliest_date = df.loc[first_appear_idx]['Count'].values
count_at_latest_date = df.loc[last_appear_idx]['Count'].values

# 创建新的DataFrame
df2['Count_at_earliest_date'] = count_at_earliest_date
df2['Count_at_latest_date'] = count_at_latest_date

# 重新设置索引
df2.reset_index(inplace=True)

这段代码会给你想要的结果,包括2和4。希望对你有所帮助!

英文:

I have a dataframe like

Item	Date	Count
Item_1	01/06/2023	25
Item_1	02/06/2023	14
Item_1	03/06/2023	5
Item_1	04/06/2023	7
Item_2	03/06/2023	41
Item_2	04/06/2023	49
Item_2	05/06/2023	2
Item_2	06/06/2023	38

and would like to aggregate to a new df with a line per item to show

1 - when it first appeared
2 - the value of "Count" when it first appeared 
3 - when it last appeared 
4 - the value of "Count" when it last appeared 
5 - the minimum value of "Count" per item
6 - the maximum value of "Count" per item

So it should look like

Item	Earliest_date	Count_at_earliest_date	Latest_date	Count_at_latest_date	Min_of_count	Max_of_count
Item_1	01/06/2023	25	04/06/2023	25	5	25
Item_2	03/06/2023	41	06/06/2023	41	2	49

I can get 1, 3, 5, 6 with

df2 = df.groupby('Item').agg(earliest_date =('Date', 'min'), latest_date =('Date', 'max'), min_count = ('Count', 'min'), max_count = ('Count', 'max'))
df2.reset_index(inplace=True)

But I can't figure how to get 2 and 4.

Please help!

答案1

得分: 1

让我们首先创建一个数据框df):
import pandas as pd

# 让我们首先创建一个数据框
data = {
    '项目': ['项目_1', '项目_1', '项目_1', '项目_1', '项目_2', '项目_2', '项目_2', '项目_2'],
    '日期': ['01/06/2023', '02/06/2023', '03/06/2023', '04/06/2023', '03/06/2023', '04/06/2023', '05/06/2023', '06/06/2023'],
    '数量': [25, 14, 5, 7, 41, 49, 2, 38]
}

df = pd.DataFrame(data)

然后应用主要逻辑

# 首先更改日期格式
df['日期'] = pd.to_datetime(df['日期'], format='%d/%m/%Y')

# 根据日期排序
df = df.sort_values('日期')

# 根据项目分组
df_first = df.groupby('项目').first().reset_index()
df_last = df.groupby('项目').last().reset_index()

# 重命名列
df_first = df_first.rename(columns={'日期': '最早日期', '数量': '最早日期的数量'})
df_last = df_last.rename(columns={'日期': '最新日期', '数量': '最新日期的数量'})

# 获取最小和最大值
df_min_max = df.groupby('项目').agg(数量最小值=('数量', 'min'), 数量最大值=('数量', 'max')).reset_index()

# 最后将它们合并
df_result = pd.merge(df_first, df_last, on='项目')
df_result = pd.merge(df_result, df_min_max, on='项目')

df_result
英文:

Let's first create a df:
import pandas as pd

# Let's first create a df
data = {
    'Item': ['Item_1', 'Item_1', 'Item_1', 'Item_1', 'Item_2', 'Item_2', 'Item_2', 'Item_2'],
    'Date': ['01/06/2023', '02/06/2023', '03/06/2023', '04/06/2023', '03/06/2023', '04/06/2023', '05/06/2023', '06/06/2023'],
    'Count': [25, 14, 5, 7, 41, 49, 2, 38]
}

df = pd.DataFrame(data)

Then applying the main logic

# change date format first
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

# sort based on Datew
df = df.sort_values('Date')

# group based on Items
df_first = df.groupby('Item').first().reset_index()
df_last = df.groupby('Item').last().reset_index()

# Rename columns
df_first = df_first.rename(columns={'Date': 'Earliest_date', 'Count': 'Count_at_earliest_date'})
df_last = df_last.rename(columns={'Date': 'Latest_date', 'Count': 'Count_at_latest_date'})

# get min and mac
df_min_max = df.groupby('Item').agg(Min_of_count=('Count', 'min'), Max_of_count=('Count', 'max')).reset_index()

# and merge them at last
df_result = pd.merge(df_first, df_last, on='Item')
df_result = pd.merge(df_result, df_min_max, on='Item')

df_result

答案2

得分: 1

请按日期对数据框进行排序,然后使用 firstlast 获取 -2- 和 -4-:

df2 = (df.sort_values('Date').groupby('Item', as_index=False)
         .agg(earliest_date=('Date', 'min'),
              count_at_earliest_date=('Count', 'first'),
              latest_date=('Date', 'max'),
              count_at_latest_date=('Count', 'last'),
              min_count=('Count', 'min'),
              max_count=('Count', 'max')))

输出:

>>> df2
     Item earliest_date  count_at_earliest_date latest_date  count_at_latest_date  min_count  max_count
0  Item_1    01/06/2023                      25  04/06/2023                     7          5         25
1  Item_2    03/06/2023                      41  06/06/2023                    38          2         49
英文:

Sort your dataframe by dates first and use first and last to get -2- and -4-:

df2 = (df.sort_values('Date').groupby('Item', as_index=False)
         .agg(earliest_date =('Date', 'min'),
              count_at_earliest_date=('Count', 'first'),
              latest_date =('Date', 'max'),
              count_at_latest_date=('Count', 'last'),
              min_count = ('Count', 'min'),
              max_count = ('Count', 'max')))

Output:

>>> df2
     Item earliest_date  count_at_earliest_date latest_date  count_at_latest_date  min_count  max_count
0  Item_1    01/06/2023                      25  04/06/2023                     7          5         25
1  Item_2    03/06/2023                      41  06/06/2023                    38          2         49

huangapple
  • 本文由 发表于 2023年6月29日 23:49:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76582687.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定