2023年2月23日 21:23:05go评论102阅读模式

英文:

Extract all entries from a pandas df where the values are the same across all years

问题

以下是翻译好的部分：

我有一个数据框，看起来像这样（还有很多其他国家，这只是一个示例）：
df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
           'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
           'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}
df = pd.DataFrame(df_dict)

我想提取所有条目，其中“value”在所有年份中都相同。有时它可能是“100”，有时可能是其他值，但这里的示例是“100”。

我不太确定如何处理这个问题。

输出应该如下所示。

df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
           'year': [2016, 2017,2018,2019,2020,2021,2022],
           'value': [100, 100, 100, 100, 100, 100,100]}
df2 = pd.DataFrame(df_dict2)

希望这对你有所帮助。

英文:

I have a dataframe that looks like this (with many more other countries, this is a sample):

df_dict = {&#39;country&#39;: [&#39;Japan&#39;,&#39;Japan&#39;,&#39;Japan&#39;,&#39;Japan&#39;,&#39;Japan&#39;,&#39;Japan&#39;,&#39;Japan&#39;, &#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;],
           &#39;year&#39;: [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
           &#39;value&#39;: [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}
df = pd.DataFrame(df_dict)

I want to extract all the entries where the value is the same across all years. Sometimes it could be 100, sometimes it could be another value, but the example here is with 100.

I'm not really sure how to go about this

The output should look like this.

df_dict2 = {&#39;country&#39;: [&#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;,&#39;Greece&#39;],
           &#39;year&#39;: [2016, 2017,2018,2019,2020,2021,2022],
           &#39;value&#39;: [100, 100, 100, 100, 100, 100,100]}
df2 = pd.DataFrame(df_dict2)

答案1

得分: 2

以下是代码部分的翻译：

如果您想知道所有年份中具有相同值的国家，请使用 groupby.nunique：

s = df.groupby('country')['value'].nunique()
out = list(s展开收缩
.index)

输出: ['Greece']

如果您还想获取该值，可以使用 groupby.agg 以及通过 loc 进行布尔索引：

(df.groupby('country')['value'].agg(['nunique', 'first'])
   .loc[lambda d: d.pop('nunique').eq(1), 'first']
)

输出:

country
Greece    100
Name: first, dtype: int64

编辑：筛选原始 DataFrame：

s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s展开收缩
.index)]

或者直接：

df[df.groupby('country')['value'].transform('nunique').eq(1)]

输出:

   country  year  value
7   Greece  2016    100
8   Greece  2017    100
9   Greece  2018    100
10  Greece  2019    100
11  Greece  2020    100
12  Greece  2021    100
13  Greece  2022    100

英文:

If you want to know the countries that have the same value across all years use groupby.nunique:

s = df.groupby(&#39;country&#39;)[&#39;value&#39;].nunique()
out = list(s展开收缩
.index)

Output: ['Greece']

If you also want the value, go for a groupby.agg with boolean indexing through loc:

(df.groupby(&#39;country&#39;)[&#39;value&#39;].agg([&#39;nunique&#39;, &#39;first&#39;])
   .loc[lambda d: d.pop(&#39;nunique&#39;).eq(1), &#39;first&#39;]
)

Output:

country
Greece    100
Name: first, dtype: int64

edit: filtering the original DataFrame:

s = df.groupby(&#39;country&#39;)[&#39;value&#39;].nunique()
df[df[&#39;country&#39;].isin(s展开收缩
.index)]

Or directly:

df[df.groupby(&#39;country&#39;)[&#39;value&#39;].transform(&#39;nunique&#39;).eq(1)]

Output:

   country  year  value
7   Greece  2016    100
8   Greece  2017    100
9   Greece  2018    100
10  Greece  2019    100
11  Greece  2020    100
12  Greece  2021    100
13  Greece  2022    100

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从pandas数据框中提取所有值在所有年份中都相同的条目。

问题

答案1

edit: filtering the original DataFrame:

打印一个Word文档，首先在Python 3中设置一些打印机属性。

如何使用`df.resample`处理离散时间？

Python round函数中的一个错误

在行上出现Pandas键错误，尽管该键存在。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。