动态查询 pandas 数据框,以获取满足其他列多个条件为 True 的列的值。

huangapple go评论96阅读模式
英文:

Dynamically query pandas df for values of column with multiple conditions on other column being True

问题

我尝试了以下方法,但没有得到预期的结果:

  1. # 初始化包含数据的列表
  2. data = [
  3. ['Lion', 'Berlin'], ['Lion', 'Munich'], ['Lion', 'Paris'],
  4. ['Monkey', 'Berlin'], ['Monkey', 'Munich'], ['Monkey', 'Rotterdam'],
  5. ['Bat', 'Berlin'],
  6. ['Goose', 'Rotterdam'],
  7. ['Tiger', 'Paris'], ['Tiger', 'Munich']
  8. ]
  9. # 创建 pandas 数据框
  10. df = pd.DataFrame(data, columns=['Animal', 'Zoo'])
  11. # 过滤数据框
  12. df_filtered = df.query(" (`Zoo` == 'Berlin' & `Zoo` == 'Munich')")
  13. # 获取动物的唯一值并转为列表
  14. result = df_filtered['Animal'].unique().tolist()
  15. # 打印结果列表
  16. print(result)

这段代码中的问题是,查询语句中使用了逻辑与(&),但这将导致空数据框,因为一个动物的 'Zoo' 列不可能既等于 'Berlin' 又等于 'Munich'。您需要使用逻辑或(|)来找到那些位于柏林或慕尼黑的动物,而不是逻辑与。以下是更正的代码:

  1. # 过滤数据框以获取柏林或慕尼黑的动物
  2. df_filtered = df.query(" (`Zoo` == 'Berlin' | `Zoo` == 'Munich')")
  3. # 获取动物的唯一值并转为列表
  4. result = df_filtered['Animal'].unique().tolist()
  5. # 打印结果列表
  6. print(result)

这将返回在柏林或慕尼黑的动物的唯一列表,如您所期望的。

英文:

I have a dataframe looking like this:

Animal Zoo
Lion Berlin
Lion Munich
Lion Paris
Monkey Berlin
Monkey Munich
Monkey Rotterdam
Bat Berlin
Goose Rotterdam
Tiger Paris
Tiger Munich

I am looking for a way to dynamically build a query which returns the unique values of animals which appear in a specific set of zoos, e.g. all animals which are located in the zoos in Berlin AND Munich.

The result should look like this:

  1. result = ['Lion', 'Monkey']

So far I tried this

  1. # # initialize list of lists
  2. data = [
  3. ['Lion', 'Berlin'], ['Lion', 'Munich'], ['Lion', 'Paris'],
  4. ['Monkey', 'Berlin'], ['Monkey', 'Munich'], ['Monkey', 'Rotterdam'],
  5. ['Bat', 'Berlin'],
  6. ['Goose', 'Rotterdam'],
  7. ['Tiger', 'Paris'], ['Tiger', 'Munich']
  8. ]
  9. # Create the pandas DataFrame
  10. df = pd.DataFrame(data, columns=['Animal', 'Zoo'])
  11. # filter df
  12. df_filtered = df.query(" (`Zoo` == 'Berlin' | `Zoo` == 'Munich')")
  13. # get animals as list
  14. result = df_filtered['Animal'].unique().tolist()
  15. # print list of results
  16. print(result)

which gives me the animals which appear in Berlin OR Munich.

  1. ['Lion', 'Monkey', 'Bat', 'Tiger']

Turning the OR into an AND statement leads to an empty dataframe:

  1. df.query(" (`Zoo` == 'Berlin' & `Zoo` == 'Munich')")

答案1

得分: 0

使用分组:

  1. result = df.groupby('Animal').filter(lambda x: set(['Berlin', 'Munich']).issubset(set(x['Zoo']))).Animal.unique().tolist()

输出:

  1. ['Lion', 'Monkey']
英文:

Use groubby:

  1. result = df.groupby('Animal').filter(lambda x: set(['Berlin', 'Munich']).issubset(set(x['Zoo']))).Animal.unique().tolist()

Output:

['Lion', 'Monkey']

答案2

得分: 0

要获取同时存在于两个动物园的动物,您需要通过这两个动物园筛选初始数据集,并通过匹配两个所需的动物园来筛选动物组:

  1. zoos = {'Berlin', 'Munich'}
  2. df[df.Zoo.isin(zoos)].groupby('Animal').filter(lambda x: x['Zoo'].nunique() == 2)

  1. Animal Zoo
  2. 0 Lion Berlin
  3. 1 Lion Munich
  4. 3 Monkey Berlin
  5. 4 Monkey Munich
英文:

To get animals that are present in both zoos you would need to filter the initial dataset by those zoos and filter groups of animals by matching 2 required zoos:

  1. zoos = {'Berlin', 'Munich'}
  2. df[df.Zoo.isin(zoos)].groupby('Animal').filter(lambda x: x['Zoo'].nunique() == 2)

  1. Animal Zoo
  2. 0 Lion Berlin
  3. 1 Lion Munich
  4. 3 Monkey Berlin
  5. 4 Monkey Munich

huangapple
  • 本文由 发表于 2023年3月31日 21:40:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75899219.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定