英文:
Dynamically query pandas df for values of column with multiple conditions on other column being True
问题
我尝试了以下方法,但没有得到预期的结果:
# 初始化包含数据的列表
data = [
['Lion', 'Berlin'], ['Lion', 'Munich'], ['Lion', 'Paris'],
['Monkey', 'Berlin'], ['Monkey', 'Munich'], ['Monkey', 'Rotterdam'],
['Bat', 'Berlin'],
['Goose', 'Rotterdam'],
['Tiger', 'Paris'], ['Tiger', 'Munich']
]
# 创建 pandas 数据框
df = pd.DataFrame(data, columns=['Animal', 'Zoo'])
# 过滤数据框
df_filtered = df.query(" (`Zoo` == 'Berlin' & `Zoo` == 'Munich')")
# 获取动物的唯一值并转为列表
result = df_filtered['Animal'].unique().tolist()
# 打印结果列表
print(result)
这段代码中的问题是,查询语句中使用了逻辑与(&
),但这将导致空数据框,因为一个动物的 'Zoo' 列不可能既等于 'Berlin' 又等于 'Munich'。您需要使用逻辑或(|
)来找到那些位于柏林或慕尼黑的动物,而不是逻辑与。以下是更正的代码:
# 过滤数据框以获取柏林或慕尼黑的动物
df_filtered = df.query(" (`Zoo` == 'Berlin' | `Zoo` == 'Munich')")
# 获取动物的唯一值并转为列表
result = df_filtered['Animal'].unique().tolist()
# 打印结果列表
print(result)
这将返回在柏林或慕尼黑的动物的唯一列表,如您所期望的。
英文:
I have a dataframe looking like this:
Animal | Zoo |
---|---|
Lion | Berlin |
Lion | Munich |
Lion | Paris |
Monkey | Berlin |
Monkey | Munich |
Monkey | Rotterdam |
Bat | Berlin |
Goose | Rotterdam |
Tiger | Paris |
Tiger | Munich |
I am looking for a way to dynamically build a query which returns the unique values of animals which appear in a specific set of zoos, e.g. all animals which are located in the zoos in Berlin AND Munich.
The result should look like this:
result = ['Lion', 'Monkey']
So far I tried this
# # initialize list of lists
data = [
['Lion', 'Berlin'], ['Lion', 'Munich'], ['Lion', 'Paris'],
['Monkey', 'Berlin'], ['Monkey', 'Munich'], ['Monkey', 'Rotterdam'],
['Bat', 'Berlin'],
['Goose', 'Rotterdam'],
['Tiger', 'Paris'], ['Tiger', 'Munich']
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Animal', 'Zoo'])
# filter df
df_filtered = df.query(" (`Zoo` == 'Berlin' | `Zoo` == 'Munich')")
# get animals as list
result = df_filtered['Animal'].unique().tolist()
# print list of results
print(result)
which gives me the animals which appear in Berlin OR Munich.
['Lion', 'Monkey', 'Bat', 'Tiger']
Turning the OR into an AND statement leads to an empty dataframe:
df.query(" (`Zoo` == 'Berlin' & `Zoo` == 'Munich')")
答案1
得分: 0
使用分组:
result = df.groupby('Animal').filter(lambda x: set(['Berlin', 'Munich']).issubset(set(x['Zoo']))).Animal.unique().tolist()
输出:
['Lion', 'Monkey']
英文:
Use groubby:
result = df.groupby('Animal').filter(lambda x: set(['Berlin', 'Munich']).issubset(set(x['Zoo']))).Animal.unique().tolist()
Output:
['Lion', 'Monkey']
答案2
得分: 0
要获取同时存在于两个动物园的动物,您需要通过这两个动物园筛选初始数据集,并通过匹配两个所需的动物园来筛选动物组:
zoos = {'Berlin', 'Munich'}
df[df.Zoo.isin(zoos)].groupby('Animal').filter(lambda x: x['Zoo'].nunique() == 2)
Animal Zoo
0 Lion Berlin
1 Lion Munich
3 Monkey Berlin
4 Monkey Munich
英文:
To get animals that are present in both zoos you would need to filter the initial dataset by those zoos and filter groups of animals by matching 2 required zoos:
zoos = {'Berlin', 'Munich'}
df[df.Zoo.isin(zoos)].groupby('Animal').filter(lambda x: x['Zoo'].nunique() == 2)
Animal Zoo
0 Lion Berlin
1 Lion Munich
3 Monkey Berlin
4 Monkey Munich
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论