从pandas数据框中提取所有值在所有年份中都相同的条目。

huangapple go评论67阅读模式
英文:

Extract all entries from a pandas df where the values are the same across all years

问题

以下是翻译好的部分:

我有一个数据框看起来像这样还有很多其他国家这只是一个示例):

df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
           'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
           'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}

df = pd.DataFrame(df_dict)

我想提取所有条目,其中“value”在所有年份中都相同。有时它可能是“100”,有时可能是其他值,但这里的示例是“100”。

我不太确定如何处理这个问题。

输出应该如下所示。

df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
           'year': [2016, 2017,2018,2019,2020,2021,2022],
           'value': [100, 100, 100, 100, 100, 100,100]}

df2 = pd.DataFrame(df_dict2)

希望这对你有所帮助。

英文:

I have a dataframe that looks like this (with many more other countries, this is a sample):

df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
           'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
           'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}

df = pd.DataFrame(df_dict)

I want to extract all the entries where the value is the same across all years. Sometimes it could be 100, sometimes it could be another value, but the example here is with 100.

I'm not really sure how to go about this

The output should look like this.

df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
           'year': [2016, 2017,2018,2019,2020,2021,2022],
           'value': [100, 100, 100, 100, 100, 100,100]}

df2 = pd.DataFrame(df_dict2)

答案1

得分: 2

以下是代码部分的翻译:

如果您想知道所有年份中具有相同值的国家,请使用 groupby.nunique

s = df.groupby('country')['value'].nunique()

out = list(s
展开收缩
.index)

输出: ['Greece']

如果您还想获取该值,可以使用 groupby.agg 以及通过 loc 进行布尔索引

(df.groupby('country')['value'].agg(['nunique', 'first'])
   .loc[lambda d: d.pop('nunique').eq(1), 'first']
)

输出:

country
Greece    100
Name: first, dtype: int64

编辑:筛选原始 DataFrame:

s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s
展开收缩
.index)]

或者直接:

df[df.groupby('country')['value'].transform('nunique').eq(1)]

输出:

   country  year  value
7   Greece  2016    100
8   Greece  2017    100
9   Greece  2018    100
10  Greece  2019    100
11  Greece  2020    100
12  Greece  2021    100
13  Greece  2022    100
英文:

If you want to know the countries that have the same value across all years use groupby.nunique:

s = df.groupby('country')['value'].nunique()

out = list(s
展开收缩
.index)

Output: ['Greece']

If you also want the value, go for a groupby.agg with boolean indexing through loc:

(df.groupby('country')['value'].agg(['nunique', 'first'])
   .loc[lambda d: d.pop('nunique').eq(1), 'first']
)

Output:

country
Greece    100
Name: first, dtype: int64

edit: filtering the original DataFrame:

s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s
展开收缩
.index)]

Or directly:

df[df.groupby('country')['value'].transform('nunique').eq(1)]

Output:

   country  year  value
7   Greece  2016    100
8   Greece  2017    100
9   Greece  2018    100
10  Greece  2019    100
11  Greece  2020    100
12  Greece  2021    100
13  Greece  2022    100

huangapple
  • 本文由 发表于 2023年2月23日 21:23:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75545443.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定