英文:
Extract all entries from a pandas df where the values are the same across all years
问题
以下是翻译好的部分:
我有一个数据框,看起来像这样(还有很多其他国家,这只是一个示例):
df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}
df = pd.DataFrame(df_dict)
我想提取所有条目,其中“value”在所有年份中都相同。有时它可能是“100”,有时可能是其他值,但这里的示例是“100”。
我不太确定如何处理这个问题。
输出应该如下所示。
df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022],
'value': [100, 100, 100, 100, 100, 100,100]}
df2 = pd.DataFrame(df_dict2)
希望这对你有所帮助。
英文:
I have a dataframe that looks like this (with many more other countries, this is a sample):
df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}
df = pd.DataFrame(df_dict)
I want to extract all the entries where the value
is the same across all years. Sometimes it could be 100
, sometimes it could be another value, but the example here is with 100
.
I'm not really sure how to go about this
The output should look like this.
df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022],
'value': [100, 100, 100, 100, 100, 100,100]}
df2 = pd.DataFrame(df_dict2)
答案1
得分: 2
以下是代码部分的翻译:
如果您想知道所有年份中具有相同值的国家,请使用 groupby.nunique
:
s = df.groupby('country')['value'].nunique()
out = list(s展开收缩.index)
输出: ['Greece']
如果您还想获取该值,可以使用 groupby.agg
以及通过 loc
进行布尔索引:
(df.groupby('country')['value'].agg(['nunique', 'first'])
.loc[lambda d: d.pop('nunique').eq(1), 'first']
)
输出:
country
Greece 100
Name: first, dtype: int64
编辑:筛选原始 DataFrame:
s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s展开收缩.index)]
或者直接:
df[df.groupby('country')['value'].transform('nunique').eq(1)]
输出:
country year value
7 Greece 2016 100
8 Greece 2017 100
9 Greece 2018 100
10 Greece 2019 100
11 Greece 2020 100
12 Greece 2021 100
13 Greece 2022 100
英文:
If you want to know the countries that have the same value across all years use groupby.nunique
:
s = df.groupby('country')['value'].nunique()
out = list(s展开收缩.index)
Output: ['Greece']
If you also want the value, go for a groupby.agg
with boolean indexing through loc
:
(df.groupby('country')['value'].agg(['nunique', 'first'])
.loc[lambda d: d.pop('nunique').eq(1), 'first']
)
Output:
country
Greece 100
Name: first, dtype: int64
edit: filtering the original DataFrame:
s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s展开收缩.index)]
Or directly:
df[df.groupby('country')['value'].transform('nunique').eq(1)]
Output:
country year value
7 Greece 2016 100
8 Greece 2017 100
9 Greece 2018 100
10 Greece 2019 100
11 Greece 2020 100
12 Greece 2021 100
13 Greece 2022 100
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论