2023年6月15日 00:38:03go评论72阅读模式

英文:

Select values in python DataFrame based on multiple condition

问题

I have a Dataframe with three columns, productId, date, price. Now I want to get the latest price for each product, if there are multiple prices for the latest date, get the mean value. How can I do this?

import pandas as pd
data = {"productId":[1,2,1,2,3],
        "date":["2021-01-02","2021-01-02","2021-01-02","2021-01-03","2021-01-03"],
        "price":[12.0,11.3,11.0,9.8,8.7]}
pd.DataFrame(data)

I tried the following code but it throws an exception that says "Can only compare identically-labeled Series objects"

latest_price = df.groupby("productId")["date"].max()
df = df.loc[df["date"] == latest_price].copy()

df["price"] = df.groupby("productId")["price"].transform("mean")

英文:

I have a Dataframe with three columns, productId, date, price. Now I want to get the latest price for each product, if there are multiple price for latest date, get the mean value. How can I do this?

import pandas as pd
data = {&quot;productId&quot;:[1,2,1,2,3],
        &quot;date&quot;:[&quot;2021-01-02&quot;,&quot;2021-01-02&quot;,&quot;2021-01-02&quot;,&quot;2021-01-03&quot;,&quot;2021-01-03&quot;],
        &quot;price&quot;:[12.0,11.3,11.0,9.8,8.7]}
pd.DataFrame(data)

I tried following code but it throws an exception that says Can only compare identically-labeled Series objects

latest_price = df.groupby(&quot;productId&quot;)[&quot;date&quot;].max()
df = df.loc[df[&quot;date&quot;] == latest_price ].copy()

df[&quot;price&quot;] = df.groupby(&quot;productId&quot;)[&quot;price&quot;].transform(&quot;mean&quot;)

答案1

得分: 1

按照它们的ID分组并取均值

df = df.groupby(['productId', 'date']).mean().reset_index()

现在你有了所有的均值，通过按日期降序排序，你将在数据框中首先获得最新的价格。然后，如果你删除重复的productIds，你将删除所有不是最新的价格。

df = df.sort_values('date', ascending=False).drop_duplicates('productId')
df

productId	date	price
2	2	2021-01-03	9.8
3	3	2021-01-03	8.7
0	1	2021-01-02	11.5

英文:

Group the products by their id and take the mean

df = df.groupby([&#39;productId&#39;, &#39;date&#39;]).mean().reset_index()

Now you have all the means, by sorting the dates in descending order you will get the latest prices first in the dataframe. If you then drop the duplicates productIds you will have deleted every price that is not the latest.

df = df.sort_values(&#39;date&#39;, ascending=False).drop_duplicates(&#39;productId&#39;)
df

productId	date	price
2	2	2021-01-03	9.8
3	3	2021-01-03	8.7
0	1	2021-01-02	11.5

答案2

得分: 1

你可以将一个函数apply用于计算最新的均价/价格，然后将其map到每个产品：

# 如果需要的话
# df["date"] = pd.to_datetime(df["date"])

latest_price = (
    df.groupby("productId").apply(
        lambda g: g.loc[g["date"].eq(g["date"].max()), "price"].mean())
)

df["price"] = df["productId"].map(latest_price)

输出：

print(df)

   productId       date  price
0          1 2021-01-02   11.5
1          2 2021-01-02    9.8
2          1 2021-01-02   11.5
3          2 2021-01-03    9.8
4          3 2021-01-03    8.7

英文:

You can apply a function to compute the latest mean/price then map it to each product :

# if needed
# df[&quot;date&quot;] = pd.to_datetime(df[&quot;date&quot;])

latest_price = (
    df.groupby(&quot;productId&quot;).apply(
        lambda g: g.loc[g[&quot;date&quot;].eq(g[&quot;date&quot;].max()), &quot;price&quot;].mean())
)

df[&quot;price&quot;] = df[&quot;productId&quot;].map(latest_price)

Output :

print(df)

   productId       date  price
0          1 2021-01-02   11.5
1          2 2021-01-02    9.8
2          1 2021-01-02   11.5
3          2 2021-01-03    9.8
4          3 2021-01-03    8.7

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python DataFrame中基于多个条件选择数值。

问题

答案1

答案2

创建和填充一个数组

我需要在 model.predict() 之前使用 RobustScaler() 和 OneHotEncoder() 吗？

如何在按下 ‘q’ 键后终止脚本？

创建带有观测和平均值的xarray数据集，该数据集具有合并的索引。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论