英文:
Get value of item when it first and last appears
问题
我可以帮你完成这个任务。要获取2和4,你可以使用以下代码:
import pandas as pd
# 将日期列转换为日期时间格式
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
# 找到每个项目的第一次出现和最后一次出现的索引
first_appear_idx = df.groupby('Item')['Date'].idxmin()
last_appear_idx = df.groupby('Item')['Date'].idxmax()
# 根据索引获取对应的Count值
count_at_earliest_date = df.loc[first_appear_idx]['Count'].values
count_at_latest_date = df.loc[last_appear_idx]['Count'].values
# 创建新的DataFrame
df2['Count_at_earliest_date'] = count_at_earliest_date
df2['Count_at_latest_date'] = count_at_latest_date
# 重新设置索引
df2.reset_index(inplace=True)
这段代码会给你想要的结果,包括2和4。希望对你有所帮助!
英文:
I have a dataframe like
Item Date Count
Item_1 01/06/2023 25
Item_1 02/06/2023 14
Item_1 03/06/2023 5
Item_1 04/06/2023 7
Item_2 03/06/2023 41
Item_2 04/06/2023 49
Item_2 05/06/2023 2
Item_2 06/06/2023 38
and would like to aggregate to a new df with a line per item to show
1 - when it first appeared
2 - the value of "Count" when it first appeared
3 - when it last appeared
4 - the value of "Count" when it last appeared
5 - the minimum value of "Count" per item
6 - the maximum value of "Count" per item
So it should look like
Item Earliest_date Count_at_earliest_date Latest_date Count_at_latest_date Min_of_count Max_of_count
Item_1 01/06/2023 25 04/06/2023 25 5 25
Item_2 03/06/2023 41 06/06/2023 41 2 49
I can get 1, 3, 5, 6 with
df2 = df.groupby('Item').agg(earliest_date =('Date', 'min'), latest_date =('Date', 'max'), min_count = ('Count', 'min'), max_count = ('Count', 'max'))
df2.reset_index(inplace=True)
But I can't figure how to get 2 and 4.
Please help!
答案1
得分: 1
让我们首先创建一个数据框(df):
import pandas as pd
# 让我们首先创建一个数据框
data = {
'项目': ['项目_1', '项目_1', '项目_1', '项目_1', '项目_2', '项目_2', '项目_2', '项目_2'],
'日期': ['01/06/2023', '02/06/2023', '03/06/2023', '04/06/2023', '03/06/2023', '04/06/2023', '05/06/2023', '06/06/2023'],
'数量': [25, 14, 5, 7, 41, 49, 2, 38]
}
df = pd.DataFrame(data)
然后应用主要逻辑:
# 首先更改日期格式
df['日期'] = pd.to_datetime(df['日期'], format='%d/%m/%Y')
# 根据日期排序
df = df.sort_values('日期')
# 根据项目分组
df_first = df.groupby('项目').first().reset_index()
df_last = df.groupby('项目').last().reset_index()
# 重命名列
df_first = df_first.rename(columns={'日期': '最早日期', '数量': '最早日期的数量'})
df_last = df_last.rename(columns={'日期': '最新日期', '数量': '最新日期的数量'})
# 获取最小和最大值
df_min_max = df.groupby('项目').agg(数量最小值=('数量', 'min'), 数量最大值=('数量', 'max')).reset_index()
# 最后将它们合并
df_result = pd.merge(df_first, df_last, on='项目')
df_result = pd.merge(df_result, df_min_max, on='项目')
df_result
英文:
Let's first create a df:
import pandas as pd
# Let's first create a df
data = {
'Item': ['Item_1', 'Item_1', 'Item_1', 'Item_1', 'Item_2', 'Item_2', 'Item_2', 'Item_2'],
'Date': ['01/06/2023', '02/06/2023', '03/06/2023', '04/06/2023', '03/06/2023', '04/06/2023', '05/06/2023', '06/06/2023'],
'Count': [25, 14, 5, 7, 41, 49, 2, 38]
}
df = pd.DataFrame(data)
Then applying the main logic
# change date format first
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
# sort based on Datew
df = df.sort_values('Date')
# group based on Items
df_first = df.groupby('Item').first().reset_index()
df_last = df.groupby('Item').last().reset_index()
# Rename columns
df_first = df_first.rename(columns={'Date': 'Earliest_date', 'Count': 'Count_at_earliest_date'})
df_last = df_last.rename(columns={'Date': 'Latest_date', 'Count': 'Count_at_latest_date'})
# get min and mac
df_min_max = df.groupby('Item').agg(Min_of_count=('Count', 'min'), Max_of_count=('Count', 'max')).reset_index()
# and merge them at last
df_result = pd.merge(df_first, df_last, on='Item')
df_result = pd.merge(df_result, df_min_max, on='Item')
df_result
答案2
得分: 1
请按日期对数据框进行排序,然后使用 first
和 last
获取 -2- 和 -4-:
df2 = (df.sort_values('Date').groupby('Item', as_index=False)
.agg(earliest_date=('Date', 'min'),
count_at_earliest_date=('Count', 'first'),
latest_date=('Date', 'max'),
count_at_latest_date=('Count', 'last'),
min_count=('Count', 'min'),
max_count=('Count', 'max')))
输出:
>>> df2
Item earliest_date count_at_earliest_date latest_date count_at_latest_date min_count max_count
0 Item_1 01/06/2023 25 04/06/2023 7 5 25
1 Item_2 03/06/2023 41 06/06/2023 38 2 49
英文:
Sort your dataframe by dates first and use first
and last
to get -2- and -4-:
df2 = (df.sort_values('Date').groupby('Item', as_index=False)
.agg(earliest_date =('Date', 'min'),
count_at_earliest_date=('Count', 'first'),
latest_date =('Date', 'max'),
count_at_latest_date=('Count', 'last'),
min_count = ('Count', 'min'),
max_count = ('Count', 'max')))
Output:
>>> df2
Item earliest_date count_at_earliest_date latest_date count_at_latest_date min_count max_count
0 Item_1 01/06/2023 25 04/06/2023 7 5 25
1 Item_2 03/06/2023 41 06/06/2023 38 2 49
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论