2023年8月9日 15:48:45go评论110阅读模式

英文:

Pandas - Get all rows that have same selected column values as specific row

问题

我有以下代码，其中features是一个列名的列表，filtered是一个Pandas数据帧。我想要做的是给定一个特定的行（索引=0），在这种情况下，我想要找到所有具有与索引为0的行相同的features值的行。下面的代码会抛出一个错误，但基本上的想法是检查每一行，如果它具有与索引为0的行相同的features，那么就返回这些行。它抛出的错误是pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

print(filtered[features].drop_duplicates().iloc[0][features])
print(filtered[filtered[features].eq(filtered[features].drop_duplicates().iloc[0][features]).all()])

还尝试了以下代码：

for idx, row in filtered.iterrows():
    sub_filt = filtered[filtered[sub_features].eq(row[sub_features]).all(axis=1)]
    print(len(sub_filt))

它打印出所有的0，但至少应该打印出1个。

英文:

I have this code below and features is a list of column names and filtered is a Pandas dataframe. What I'm trying to do is given a specific row (index=0) in this case I'm trying to find all the rows with the same features values as row at index = 0. The below throws an error, but essentially the idea was to check each row and if it had the same features and row with index 0 then it would return those rows. The error it throws is pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

print(filtered[features].drop_duplicates().iloc[0][features])
print(filtered[filtered[features].eq(filtered[features].drop_duplicates().iloc[0][features]).all()])

Also tried:

for idx, row in filtered.iterrows():
    sub_filt = filtered[filtered[sub_features].eq(row[sub_features]).all(axis=1)]
    print(len(sub_filt))

and it prints all 0s, but it should be printing at least a 1.

答案1

得分: 3

你可以使用DataFrame.all函数来创建掩码，设置axis=1：

d = {'c1': ['a','b','c','a','b','c'],
     'c2': [1,2,3,1,2,1],
     'c3': range(2,8)}
filtered = pd.DataFrame(data=d)
features = ['c1','c2']
print(filtered[features].eq(filtered[features].iloc[0]))
      c1     c2
0   True   True
1  False  False
2  False  False
3   True   True
4  False  False
5  False   True
print(filtered[features].eq(filtered[features].iloc[0]).all(axis=1))
0     True
1    False
2    False
3     True
4    False
5    False
dtype: bool
print(filtered[filtered[features].eq(filtered[features].iloc[0]).all(axis=1)])
  c1  c2  c3
0  a   1   2
3  a   1   5

这段代码会创建一个掩码，用于筛选出与第一行相同的行。

英文:

You are close, for create mask use DataFrame.all with axis=1 :

d = {&#39;c1&#39;: [&#39;a&#39;,&#39;b&#39;,&#39;c&#39;,&#39;a&#39;,&#39;b&#39;,&#39;c&#39;],
     &#39;c2&#39;: [1,2,3,1,2,1],
     &#39;c3&#39;:range(2,8)}
filtered = pd.DataFrame(data=d)
    
features = [&#39;c1&#39;,&#39;c2&#39;]
print(filtered[features].eq(filtered[features].iloc[0]))
      c1     c2
0   True   True
1  False  False
2  False  False
3   True   True
4  False  False
5  False   True
print(filtered[features].eq(filtered[features].iloc[0]).all(axis=1))
0     True
1    False
2    False
3     True
4    False
5    False
dtype: bool

print(filtered[filtered[features].eq(filtered[features].iloc[0]).all(axis=1)])
  c1  c2  c3
0  a   1   2
3  a   1   5

答案2

得分: 0

这应该能回答你的问题。
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

根据条件筛选行：

rows_based_on_condition = df[df["column"] == value]

当你想基于多列应用这个方法时，你可以使用 for 循环多次替换输入的 df（或 df 的副本）：

columns = ["column_1", "column_2"]
val_dict = dict()
for col in columns:
    val_dict.update({col:df.iloc[0][col]})
for key in val_dict.keys():
    value = val_dict[key]
    df = df[df[key] == value]

另外，尽量以每行只包含一个步骤的方式编写代码，这样更容易阅读。我以前的一个项目在这个链接中有一个很好的示例：
# 将 wdi_df 从宽格式转换为长格式（在 ctrl-F 搜索栏中复制）
https://github.com/MaximilianHauser/CC_DS21_Life_Expectancy_and_GDP/blob/main/life_expectancy_gdp.ipynb

如果你正在进行在线数据科学课程，这也可能会很有趣。祝你编码愉快！

英文:

This should answer your question.
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

rows_based_on_condition = df[df[&quot;column&quot;] == value]

When you're trying to apply it based on multiple columns, you can replace the input df (or a copy of the df) multiple times using a for-loop:

columns = [&quot;column_1&quot;, &quot;column_2&quot;]
val_dict = dict()
for col in columns:
    val_dict.update({col:df.iloc[0][col]})
for key in val_dict.keys():
    value = val_dict[key]
    df = df[df[key] == value]

Also try to write your code in such a way, that each line only contains one step. Makes it easier to read.
This old project of mine, contains a good example at:
# reshaping wdi_df from wide to long form (<- copy into ctrl-F searchbar)
https://github.com/MaximilianHauser/CC_DS21_Life_Expectancy_and_GDP/blob/main/life_expectancy_gdp.ipynb
Also might be interesting if you're doing an online data science course.
Happy Coding!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas – 获取与特定行具有相同选定列值的所有行

问题

答案1

答案2

Python输出在Quarto中的突出显示

FileNotFoundError在导入读取CSV文件的类时发生。

如何同时填充几列中的缺失数值

“尝试的更改与已接受的更改冲突” 错误在 Microsoft Graph Planner API 中发生。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。