在Python DataFrame中基于多个条件选择数值。

huangapple go评论72阅读模式
英文:

Select values in python DataFrame based on multiple condition

问题

I have a Dataframe with three columns, productId, date, price. Now I want to get the latest price for each product, if there are multiple prices for the latest date, get the mean value. How can I do this?

import pandas as pd
data = {"productId":[1,2,1,2,3],
        "date":["2021-01-02","2021-01-02","2021-01-02","2021-01-03","2021-01-03"],
        "price":[12.0,11.3,11.0,9.8,8.7]}
pd.DataFrame(data)

I tried the following code but it throws an exception that says "Can only compare identically-labeled Series objects"

latest_price = df.groupby("productId")["date"].max()
df = df.loc[df["date"] == latest_price].copy()

df["price"] = df.groupby("productId")["price"].transform("mean")
英文:

I have a Dataframe with three columns, productId, date, price. Now I want to get the latest price for each product, if there are multiple price for latest date, get the mean value. How can I do this?

import pandas as pd
data = {"productId":[1,2,1,2,3],
        "date":["2021-01-02","2021-01-02","2021-01-02","2021-01-03","2021-01-03"],
        "price":[12.0,11.3,11.0,9.8,8.7]}
pd.DataFrame(data)

I tried following code but it throws an exception that says Can only compare identically-labeled Series objects

latest_price = df.groupby("productId")["date"].max()
df = df.loc[df["date"] == latest_price ].copy()

df["price"] = df.groupby("productId")["price"].transform("mean")

答案1

得分: 1

按照它们的ID分组并取均值

df = df.groupby(['productId', 'date']).mean().reset_index()

现在你有了所有的均值,通过按日期降序排序,你将在数据框中首先获得最新的价格。然后,如果你删除重复的productIds,你将删除所有不是最新的价格。

df = df.sort_values('date', ascending=False).drop_duplicates('productId')
df

productId	date	price
2	2	2021-01-03	9.8
3	3	2021-01-03	8.7
0	1	2021-01-02	11.5
英文:

Group the products by their id and take the mean

df = df.groupby(['productId', 'date']).mean().reset_index()

Now you have all the means, by sorting the dates in descending order you will get the latest prices first in the dataframe. If you then drop the duplicates productIds you will have deleted every price that is not the latest.

df = df.sort_values('date', ascending=False).drop_duplicates('productId')
df

productId	date	price
2	2	2021-01-03	9.8
3	3	2021-01-03	8.7
0	1	2021-01-02	11.5

答案2

得分: 1

你可以将一个函数apply用于计算最新的均价/价格,然后将其map到每个产品:

# 如果需要的话
# df["date"] = pd.to_datetime(df["date"])

latest_price = (
    df.groupby("productId").apply(
        lambda g: g.loc[g["date"].eq(g["date"].max()), "price"].mean())
)

df["price"] = df["productId"].map(latest_price)

输出:

print(df)

   productId       date  price
0          1 2021-01-02   11.5
1          2 2021-01-02    9.8
2          1 2021-01-02   11.5
3          2 2021-01-03    9.8
4          3 2021-01-03    8.7
英文:

You can apply a function to compute the latest mean/price then map it to each product :

# if needed
# df["date"] = pd.to_datetime(df["date"])

latest_price = (
    df.groupby("productId").apply(
        lambda g: g.loc[g["date"].eq(g["date"].max()), "price"].mean())
)

df["price"] = df["productId"].map(latest_price)

Output :

print(df)

   productId       date  price
0          1 2021-01-02   11.5
1          2 2021-01-02    9.8
2          1 2021-01-02   11.5
3          2 2021-01-03    9.8
4          3 2021-01-03    8.7

huangapple
  • 本文由 发表于 2023年6月15日 00:38:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76475781.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定