Pandas – 获取具有与特定行相同的选定列值的所有行

huangapple go评论106阅读模式
英文:

Pandas - Get all rows that have same selected column values as specific row

问题

我有以下代码,features是列名的列表,filtered是一个Pandas数据帧。我尝试的目标是针对特定行(索引=0),在这种情况下,我试图找到所有具有与索引=0处行相同的features值的行。以下代码引发了错误,但基本上想法是检查每一行,如果它具有与索引0处行相同的features,那么就返回这些行。它引发的错误是pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

  1. print(filtered[features].drop_duplicates().iloc[0][features])
  2. print(filtered[filtered[features].eq(filtered[features].drop_duplicates().iloc[0][features]).all()])

也尝试了以下方法:

  1. for idx, row in filtered.iterrows():
  2. sub_filt = filtered[filtered[sub_features].eq(row[sub_features]).all(axis=1)]
  3. print(len(sub_filt))

它打印出所有的0,但应该至少打印出1。

英文:

I have this code below and features is a list of column names and filtered is a Pandas dataframe. What I'm trying to do is given a specific row (index=0) in this case I'm trying to find all the rows with the same features values as row at index = 0. The below throws an error, but essentially the idea was to check each row and if it had the same features and row with index 0 then it would return those rows. The error it throws is pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

  1. print(filtered[features].drop_duplicates().iloc[0][features])
  2. print(filtered[filtered[features].eq(filtered[features].drop_duplicates().iloc[0][features]).all()])

Also tried:

  1. for idx, row in filtered.iterrows():
  2. sub_filt = filtered[filtered[sub_features].eq(row[sub_features]).all(axis=1)]
  3. print(len(sub_filt))

and it prints all 0s, but it should be printing at least a 1.

答案1

得分: 3

你已经接近了,要创建掩码,请使用 DataFrame.all,并设置 axis=1

  1. d = {'c1': ['a','b','c','a','b','c'],
  2. 'c2': [1,2,3,1,2,1],
  3. 'c3': range(2,8)}
  4. filtered = pd.DataFrame(data=d)
  5. features = ['c1','c2']
  6. print(filtered[features].eq(filtered[features].iloc[0]))
  1. print(filtered[features].eq(filtered[features].iloc[0]).all(axis=1))
  1. print(filtered[filtered[features].eq(filtered[features].iloc[0]).all(axis=1)])

这将创建一个掩码,用于过滤 DataFrame,使其只包含与第一行匹配的行。

英文:

You are close, for create mask use DataFrame.all with axis=1 :

  1. d = {'c1': ['a','b','c','a','b','c'],
  2. 'c2': [1,2,3,1,2,1],
  3. 'c3':range(2,8)}
  4. filtered = pd.DataFrame(data=d)
  5. features = ['c1','c2']
  6. print(filtered[features].eq(filtered[features].iloc[0]))
  7. c1 c2
  8. 0 True True
  9. 1 False False
  10. 2 False False
  11. 3 True True
  12. 4 False False
  13. 5 False True
  14. print(filtered[features].eq(filtered[features].iloc[0]).all(axis=1))
  15. 0 True
  16. 1 False
  17. 2 False
  18. 3 True
  19. 4 False
  20. 5 False
  21. dtype: bool

  1. print(filtered[filtered[features].eq(filtered[features].iloc[0]).all(axis=1)])
  2. c1 c2 c3
  3. 0 a 1 2
  4. 3 a 1 5

答案2

得分: 0

这应该回答了你的问题。
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

根据条件选择行:

  1. rows_based_on_condition = df[df["column"] == value]

当你尝试基于多列应用它时,可以使用 for 循环多次替换输入的 df(或 df 的副本):

  1. columns = ["column_1", "column_2"]
  2. val_dict = dict()
  3. for col in columns:
  4. val_dict.update({col: df.iloc[0][col]})
  5. for key in val_dict.keys():
  6. value = val_dict[key]
  7. df = df[df[key] == value]

还要尽量以每行只包含一步的方式编写代码,这样更容易阅读。

我的这个旧项目包含了一个很好的示例:

将 wdi_df 从宽格式转换为长格式(复制到 ctrl-F 搜索栏中)

https://github.com/MaximilianHauser/CC_DS21_Life_Expectancy_and_GDP/blob/main/life_expectancy_gdp.ipynb

如果你正在参加在线数据科学课程,这也可能很有趣。
愿你编程愉快!

英文:

This should answer your question.
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

  1. rows_based_on_condition = df[df["column"] == value]

When you're trying to apply it based on multiple columns, you can replace the input df (or a copy of the df) multiple times using a for-loop:

  1. columns = ["column_1", "column_2"]
  2. val_dict = dict()
  3. for col in columns:
  4. val_dict.update({col:df.iloc[0][col]})
  5. for key in val_dict.keys():
  6. value = val_dict[key]
  7. df = df[df[key] == value]

Also try to write your code in such a way, that each line only contains one step. Makes it easier to read.
This old project of mine, contains a good example at:
# reshaping wdi_df from wide to long form (<- copy into ctrl-F searchbar)
https://github.com/MaximilianHauser/CC_DS21_Life_Expectancy_and_GDP/blob/main/life_expectancy_gdp.ipynb
Also might be interesting if you're doing an online data science course.
Happy Coding!

huangapple
  • 本文由 发表于 2023年8月9日 15:48:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76865639-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定