英文:
Extract all entries from a pandas df where the values are the same across all years
问题
以下是翻译好的部分:
我有一个数据框,看起来像这样(还有很多其他国家,这只是一个示例):
df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}
df = pd.DataFrame(df_dict)
我想提取所有条目,其中“value”在所有年份中都相同。有时它可能是“100”,有时可能是其他值,但这里的示例是“100”。
我不太确定如何处理这个问题。
输出应该如下所示。
df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022],
'value': [100, 100, 100, 100, 100, 100,100]}
df2 = pd.DataFrame(df_dict2)
希望这对你有所帮助。
英文:
I have a dataframe that looks like this (with many more other countries, this is a sample):
df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}
df = pd.DataFrame(df_dict)
I want to extract all the entries where the value is the same across all years. Sometimes it could be 100, sometimes it could be another value, but the example here is with 100.
I'm not really sure how to go about this
The output should look like this.
df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022],
'value': [100, 100, 100, 100, 100, 100,100]}
df2 = pd.DataFrame(df_dict2)
答案1
得分: 2
以下是代码部分的翻译:
如果您想知道所有年份中具有相同值的国家,请使用 groupby.nunique:
s = df.groupby('country')['value'].nunique()
out = list(s展开收缩.index)
输出: ['Greece']
如果您还想获取该值,可以使用 groupby.agg 以及通过 loc 进行布尔索引:
(df.groupby('country')['value'].agg(['nunique', 'first'])
.loc[lambda d: d.pop('nunique').eq(1), 'first']
)
输出:
country
Greece 100
Name: first, dtype: int64
编辑:筛选原始 DataFrame:
s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s展开收缩.index)]
或者直接:
df[df.groupby('country')['value'].transform('nunique').eq(1)]
输出:
country year value
7 Greece 2016 100
8 Greece 2017 100
9 Greece 2018 100
10 Greece 2019 100
11 Greece 2020 100
12 Greece 2021 100
13 Greece 2022 100
英文:
If you want to know the countries that have the same value across all years use groupby.nunique:
s = df.groupby('country')['value'].nunique()
out = list(s展开收缩.index)
Output: ['Greece']
If you also want the value, go for a groupby.agg with boolean indexing through loc:
(df.groupby('country')['value'].agg(['nunique', 'first'])
.loc[lambda d: d.pop('nunique').eq(1), 'first']
)
Output:
country
Greece 100
Name: first, dtype: int64
edit: filtering the original DataFrame:
s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s展开收缩.index)]
Or directly:
df[df.groupby('country')['value'].transform('nunique').eq(1)]
Output:
country year value
7 Greece 2016 100
8 Greece 2017 100
9 Greece 2018 100
10 Greece 2019 100
11 Greece 2020 100
12 Greece 2021 100
13 Greece 2022 100
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论