英文:
Using df.loc[] vs df[] shorthand with boolean masks, pandas
问题
df[booleanMask] 和 df.loc[booleanMask] 都对我有效,但我不明白为什么。我认为不使用 .loc 的 df[] 简写应该应用于列,而我试图应用于行,所以我认为我需要使用 .loc。
以下是具体的代码:
# 布尔运算符
# 所有球队至少进4球并零封对手的比赛
hw_4_0 = (pl23['FTHG'] >= 4) & (pl23['FTAG'] == 0)
aw_0_4 = (pl23['FTHG'] == 0) & (pl23['FTAG'] >= 4)
pl23.loc[aw_0_4 | hw_4_0]
例如,pl23.loc[aw_0_4 | hw_4_0, :] 也有效,但 pl23.loc[:, aw_0_4 | hw_4_0] 不行。我认为 df[布尔掩码] 是后者的简写(与索引一样),那么为什么在这种情况下有效呢?
使用了 pl23.loc[aw_0_4 | hw_4_0],返回了查询设计的数据框,而我预期会出现 IndexingError: Unalignable boolean Series provided as indexer(所提供的布尔 Series 的索引与被索引的对象的索引不匹配)。
英文:
Both df[booleanMask] and df.loc[booleanMask] are working for me but I don't understand why. The shorthand df[] without using .loc I thought applied to the column whereas I am trying to apply to the row, so I thought I needed to use .loc
Here is the specific code:
# Boolean operators
# All the games where a team scored at least 4 goals and won to nil
hw_4_0 = (pl23['FTHG'] >= 4) & (pl23['FTAG'] == 0)
aw_0_4 = (pl23['FTHG'] == 0) & (pl23['FTAG'] >= 4)
pl23.loc[aw_0_4 | hw_4_0]
For example, pl23.loc[aw_0_4 | hw_4_0, :] also works, but pl23.loc[:, aw_0_4 | hw_4_0] doesn't. I thought that df[boolean mask] was short hand for the latter (as with indexing), so why does it work in this instance?
Used pl23.loc[aw_0_4 | hw_4_0] which returned the data frame the query was designed for, whereas I was expecting IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
答案1
得分: 1
df[…]
与 df.loc[…]
适用于列与索引,当你使用标签时。
如果你传递一个布尔 Series(或其他可迭代对象)用于布尔索引,那么它们都作用于索引级别。要在列上执行布尔索引,你需要使用 df.loc[:, …]
示例:
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
# 选择列 "col1"
df['col1']
# 选择索引 "0"
df.loc[0]
# 在索引上进行布尔索引
df[df['col1'].ge(2)]
# 或者
df.loc[df['col1'].ge(2)]
# 或者
df[[False, True, True]]
# 或者
df.loc[[False, True, True]]
# 在列上进行布尔索引
df.loc[:, df.loc[0].ge(2)]
# 或者
df.loc[:, [False, True]]
英文:
df[…]
vs df.loc[…]
applies on columns vs index, when you use labels.
If you pass a boolean Series (or other iterable) for boolean indexing, then they both act on the index level. To perform boolean indexing on columns, you need df.loc[:, …]
Example:
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
# select "col1" in the columns
df['col1']
# select "0" in the index
df.loc[0]
# boolean indexing on the index
df[df['col1'].ge(2)]
# or
df.loc[df['col1'].ge(2)]
# or
df[[False, True, True]]
# or
df.loc[[False, True, True]]
# boolean indexing on the columns
df.loc[:, df.loc[0].ge(2)]
# or
df.loc[:, [False, True]]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论