英文:
Select values in python DataFrame based on multiple condition
问题
I have a Dataframe with three columns, productId, date, price. Now I want to get the latest price for each product, if there are multiple prices for the latest date, get the mean value. How can I do this?
import pandas as pd
data = {"productId":[1,2,1,2,3],
"date":["2021-01-02","2021-01-02","2021-01-02","2021-01-03","2021-01-03"],
"price":[12.0,11.3,11.0,9.8,8.7]}
pd.DataFrame(data)
I tried the following code but it throws an exception that says "Can only compare identically-labeled Series objects"
latest_price = df.groupby("productId")["date"].max()
df = df.loc[df["date"] == latest_price].copy()
df["price"] = df.groupby("productId")["price"].transform("mean")
英文:
I have a Dataframe with three columns, productId, date, price. Now I want to get the latest price for each product, if there are multiple price for latest date, get the mean value. How can I do this?
import pandas as pd
data = {"productId":[1,2,1,2,3],
"date":["2021-01-02","2021-01-02","2021-01-02","2021-01-03","2021-01-03"],
"price":[12.0,11.3,11.0,9.8,8.7]}
pd.DataFrame(data)
I tried following code but it throws an exception that says Can only compare identically-labeled Series objects
latest_price = df.groupby("productId")["date"].max()
df = df.loc[df["date"] == latest_price ].copy()
df["price"] = df.groupby("productId")["price"].transform("mean")
答案1
得分: 1
按照它们的ID分组并取均值
df = df.groupby(['productId', 'date']).mean().reset_index()
现在你有了所有的均值,通过按日期降序排序,你将在数据框中首先获得最新的价格。然后,如果你删除重复的productIds,你将删除所有不是最新的价格。
df = df.sort_values('date', ascending=False).drop_duplicates('productId')
df
productId date price
2 2 2021-01-03 9.8
3 3 2021-01-03 8.7
0 1 2021-01-02 11.5
英文:
Group the products by their id and take the mean
df = df.groupby(['productId', 'date']).mean().reset_index()
Now you have all the means, by sorting the dates in descending order you will get the latest prices first in the dataframe. If you then drop the duplicates productIds you will have deleted every price that is not the latest.
df = df.sort_values('date', ascending=False).drop_duplicates('productId')
df
productId date price
2 2 2021-01-03 9.8
3 3 2021-01-03 8.7
0 1 2021-01-02 11.5
答案2
得分: 1
你可以将一个函数apply
用于计算最新的均价/价格,然后将其map
到每个产品:
# 如果需要的话
# df["date"] = pd.to_datetime(df["date"])
latest_price = (
df.groupby("productId").apply(
lambda g: g.loc[g["date"].eq(g["date"].max()), "price"].mean())
)
df["price"] = df["productId"].map(latest_price)
输出:
print(df)
productId date price
0 1 2021-01-02 11.5
1 2 2021-01-02 9.8
2 1 2021-01-02 11.5
3 2 2021-01-03 9.8
4 3 2021-01-03 8.7
英文:
You can apply
a function to compute the latest mean/price then map
it to each product :
# if needed
# df["date"] = pd.to_datetime(df["date"])
latest_price = (
df.groupby("productId").apply(
lambda g: g.loc[g["date"].eq(g["date"].max()), "price"].mean())
)
df["price"] = df["productId"].map(latest_price)
Output :
print(df)
productId date price
0 1 2021-01-02 11.5
1 2 2021-01-02 9.8
2 1 2021-01-02 11.5
3 2 2021-01-03 9.8
4 3 2021-01-03 8.7
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论