动态查询 pandas 数据框,以获取满足其他列多个条件为 True 的列的值。

huangapple go评论62阅读模式
英文:

Dynamically query pandas df for values of column with multiple conditions on other column being True

问题

我尝试了以下方法,但没有得到预期的结果:

# 初始化包含数据的列表
data = [
        ['Lion', 'Berlin'], ['Lion', 'Munich'], ['Lion', 'Paris'],
        ['Monkey', 'Berlin'], ['Monkey', 'Munich'], ['Monkey', 'Rotterdam'],
        ['Bat', 'Berlin'],
        ['Goose', 'Rotterdam'],
        ['Tiger', 'Paris'], ['Tiger', 'Munich']
]

# 创建 pandas 数据框
df = pd.DataFrame(data, columns=['Animal', 'Zoo'])

# 过滤数据框
df_filtered = df.query(" (`Zoo` == 'Berlin' & `Zoo` == 'Munich')")

# 获取动物的唯一值并转为列表
result = df_filtered['Animal'].unique().tolist()

# 打印结果列表
print(result)

这段代码中的问题是,查询语句中使用了逻辑与(&),但这将导致空数据框,因为一个动物的 'Zoo' 列不可能既等于 'Berlin' 又等于 'Munich'。您需要使用逻辑或(|)来找到那些位于柏林或慕尼黑的动物,而不是逻辑与。以下是更正的代码:

# 过滤数据框以获取柏林或慕尼黑的动物
df_filtered = df.query(" (`Zoo` == 'Berlin' | `Zoo` == 'Munich')")

# 获取动物的唯一值并转为列表
result = df_filtered['Animal'].unique().tolist()

# 打印结果列表
print(result)

这将返回在柏林或慕尼黑的动物的唯一列表,如您所期望的。

英文:

I have a dataframe looking like this:

Animal Zoo
Lion Berlin
Lion Munich
Lion Paris
Monkey Berlin
Monkey Munich
Monkey Rotterdam
Bat Berlin
Goose Rotterdam
Tiger Paris
Tiger Munich

I am looking for a way to dynamically build a query which returns the unique values of animals which appear in a specific set of zoos, e.g. all animals which are located in the zoos in Berlin AND Munich.

The result should look like this:

result = ['Lion', 'Monkey']

So far I tried this

# # initialize list of lists
data = [
        ['Lion', 'Berlin'], ['Lion', 'Munich'], ['Lion', 'Paris'],
        ['Monkey', 'Berlin'], ['Monkey', 'Munich'], ['Monkey', 'Rotterdam'],
        ['Bat', 'Berlin'],
        ['Goose', 'Rotterdam'],
        ['Tiger', 'Paris'], ['Tiger', 'Munich']
]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Animal', 'Zoo'])

# filter df
df_filtered = df.query(" (`Zoo` == 'Berlin' | `Zoo` == 'Munich')")

# get animals as list
result = df_filtered['Animal'].unique().tolist()

# print list of results
print(result)

which gives me the animals which appear in Berlin OR Munich.

['Lion', 'Monkey', 'Bat', 'Tiger']

Turning the OR into an AND statement leads to an empty dataframe:

df.query(" (`Zoo` == 'Berlin' & `Zoo` == 'Munich')")

答案1

得分: 0

使用分组:

result = df.groupby('Animal').filter(lambda x: set(['Berlin', 'Munich']).issubset(set(x['Zoo']))).Animal.unique().tolist()

输出:

['Lion', 'Monkey']
英文:

Use groubby:

result = df.groupby('Animal').filter(lambda x: set(['Berlin', 'Munich']).issubset(set(x['Zoo']))).Animal.unique().tolist()

Output:

['Lion', 'Monkey']

答案2

得分: 0

要获取同时存在于两个动物园的动物,您需要通过这两个动物园筛选初始数据集,并通过匹配两个所需的动物园来筛选动物组:

zoos = {'Berlin', 'Munich'}
df[df.Zoo.isin(zoos)].groupby('Animal').filter(lambda x: x['Zoo'].nunique() == 2)

        Animal     Zoo
    0    Lion   Berlin
    1    Lion   Munich
    3  Monkey   Berlin
    4  Monkey   Munich
英文:

To get animals that are present in both zoos you would need to filter the initial dataset by those zoos and filter groups of animals by matching 2 required zoos:

zoos = {'Berlin', 'Munich'}
df[df.Zoo.isin(zoos)].groupby('Animal').filter(lambda x: x['Zoo'].nunique() == 2)

    Animal     Zoo
0    Lion   Berlin
1    Lion   Munich
3  Monkey   Berlin
4  Monkey   Munich

huangapple
  • 本文由 发表于 2023年3月31日 21:40:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75899219.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定