英文:
Pandas - Get all rows that have same selected column values as specific row
问题
我有以下代码,其中features
是一个列名的列表,filtered
是一个Pandas数据帧。我想要做的是给定一个特定的行(索引=0),在这种情况下,我想要找到所有具有与索引为0的行相同的features
值的行。下面的代码会抛出一个错误,但基本上的想法是检查每一行,如果它具有与索引为0的行相同的features
,那么就返回这些行。它抛出的错误是pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
print(filtered[features].drop_duplicates().iloc[0][features])
print(filtered[filtered[features].eq(filtered[features].drop_duplicates().iloc[0][features]).all()])
还尝试了以下代码:
for idx, row in filtered.iterrows():
sub_filt = filtered[filtered[sub_features].eq(row[sub_features]).all(axis=1)]
print(len(sub_filt))
它打印出所有的0,但至少应该打印出1个。
英文:
I have this code below and features
is a list of column names and filtered
is a Pandas dataframe. What I'm trying to do is given a specific row (index=0) in this case I'm trying to find all the rows with the same features
values as row at index = 0. The below throws an error, but essentially the idea was to check each row and if it had the same features
and row with index 0 then it would return those rows. The error it throws is pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
print(filtered[features].drop_duplicates().iloc[0][features])
print(filtered[filtered[features].eq(filtered[features].drop_duplicates().iloc[0][features]).all()])
Also tried:
for idx, row in filtered.iterrows():
sub_filt = filtered[filtered[sub_features].eq(row[sub_features]).all(axis=1)]
print(len(sub_filt))
and it prints all 0s, but it should be printing at least a 1.
答案1
得分: 3
你可以使用DataFrame.all
函数来创建掩码,设置axis=1
:
d = {'c1': ['a','b','c','a','b','c'],
'c2': [1,2,3,1,2,1],
'c3': range(2,8)}
filtered = pd.DataFrame(data=d)
features = ['c1','c2']
print(filtered[features].eq(filtered[features].iloc[0]))
c1 c2
0 True True
1 False False
2 False False
3 True True
4 False False
5 False True
print(filtered[features].eq(filtered[features].iloc[0]).all(axis=1))
0 True
1 False
2 False
3 True
4 False
5 False
dtype: bool
print(filtered[filtered[features].eq(filtered[features].iloc[0]).all(axis=1)])
c1 c2 c3
0 a 1 2
3 a 1 5
这段代码会创建一个掩码,用于筛选出与第一行相同的行。
英文:
You are close, for create mask use DataFrame.all
with axis=1
:
d = {'c1': ['a','b','c','a','b','c'],
'c2': [1,2,3,1,2,1],
'c3':range(2,8)}
filtered = pd.DataFrame(data=d)
features = ['c1','c2']
print(filtered[features].eq(filtered[features].iloc[0]))
c1 c2
0 True True
1 False False
2 False False
3 True True
4 False False
5 False True
print(filtered[features].eq(filtered[features].iloc[0]).all(axis=1))
0 True
1 False
2 False
3 True
4 False
5 False
dtype: bool
print(filtered[filtered[features].eq(filtered[features].iloc[0]).all(axis=1)])
c1 c2 c3
0 a 1 2
3 a 1 5
答案2
得分: 0
这应该能回答你的问题。
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html
根据条件筛选行:
rows_based_on_condition = df[df["column"] == value]
当你想基于多列应用这个方法时,你可以使用 for 循环多次替换输入的 df(或 df 的副本):
columns = ["column_1", "column_2"]
val_dict = dict()
for col in columns:
val_dict.update({col:df.iloc[0][col]})
for key in val_dict.keys():
value = val_dict[key]
df = df[df[key] == value]
另外,尽量以每行只包含一个步骤的方式编写代码,这样更容易阅读。我以前的一个项目在这个链接中有一个很好的示例:
# 将 wdi_df 从宽格式转换为长格式(在 ctrl-F 搜索栏中复制)
https://github.com/MaximilianHauser/CC_DS21_Life_Expectancy_and_GDP/blob/main/life_expectancy_gdp.ipynb
如果你正在进行在线数据科学课程,这也可能会很有趣。祝你编码愉快!
英文:
This should answer your question.
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html
rows_based_on_condition = df[df["column"] == value]
When you're trying to apply it based on multiple columns, you can replace the input df (or a copy of the df) multiple times using a for-loop:
columns = ["column_1", "column_2"]
val_dict = dict()
for col in columns:
val_dict.update({col:df.iloc[0][col]})
for key in val_dict.keys():
value = val_dict[key]
df = df[df[key] == value]
Also try to write your code in such a way, that each line only contains one step. Makes it easier to read.
This old project of mine, contains a good example at:
# reshaping wdi_df from wide to long form (<- copy into ctrl-F searchbar)
https://github.com/MaximilianHauser/CC_DS21_Life_Expectancy_and_GDP/blob/main/life_expectancy_gdp.ipynb
Also might be interesting if you're doing an online data science course.
Happy Coding!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论