使用 df.loc[] 与使用布尔蒙版的 df[] 简写,pandas

huangapple go评论75阅读模式
英文:

Using df.loc[] vs df[] shorthand with boolean masks, pandas

问题

df[booleanMask] 和 df.loc[booleanMask] 都对我有效,但我不明白为什么。我认为不使用 .loc 的 df[] 简写应该应用于列,而我试图应用于行,所以我认为我需要使用 .loc。

以下是具体的代码:

# 布尔运算符
# 所有球队至少进4球并零封对手的比赛
hw_4_0 = (pl23['FTHG'] >= 4) & (pl23['FTAG'] == 0)
aw_0_4 = (pl23['FTHG'] == 0) & (pl23['FTAG'] >= 4)
pl23.loc[aw_0_4 | hw_4_0]

例如,pl23.loc[aw_0_4 | hw_4_0, :] 也有效,但 pl23.loc[:, aw_0_4 | hw_4_0] 不行。我认为 df[布尔掩码] 是后者的简写(与索引一样),那么为什么在这种情况下有效呢?

使用了 pl23.loc[aw_0_4 | hw_4_0],返回了查询设计的数据框,而我预期会出现 IndexingError: Unalignable boolean Series provided as indexer(所提供的布尔 Series 的索引与被索引的对象的索引不匹配)。

英文:

Both df[booleanMask] and df.loc[booleanMask] are working for me but I don't understand why. The shorthand df[] without using .loc I thought applied to the column whereas I am trying to apply to the row, so I thought I needed to use .loc

Here is the specific code:

# Boolean operators
# All the games where a team scored at least 4 goals and won to nil
hw_4_0 = (pl23['FTHG'] >= 4) & (pl23['FTAG'] == 0)
aw_0_4 = (pl23['FTHG'] == 0) & (pl23['FTAG'] >= 4)
pl23.loc[aw_0_4 | hw_4_0]

For example, pl23.loc[aw_0_4 | hw_4_0, :] also works, but pl23.loc[:, aw_0_4 | hw_4_0] doesn't. I thought that df[boolean mask] was short hand for the latter (as with indexing), so why does it work in this instance?

Used pl23.loc[aw_0_4 | hw_4_0] which returned the data frame the query was designed for, whereas I was expecting IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

答案1

得分: 1

df[…]df.loc[…] 适用于列与索引,当你使用标签时

如果你传递一个布尔 Series(或其他可迭代对象)用于布尔索引,那么它们都作用于索引级别。要在列上执行布尔索引,你需要使用 df.loc[:, …]

示例:

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})

# 选择列 "col1"
df['col1']

# 选择索引 "0"
df.loc[0]


# 在索引上进行布尔索引
df[df['col1'].ge(2)]
# 或者
df.loc[df['col1'].ge(2)]
# 或者
df[[False, True, True]]
# 或者
df.loc[[False, True, True]]


# 在列上进行布尔索引
df.loc[:, df.loc[0].ge(2)]
# 或者
df.loc[:, [False, True]]
英文:

df[…] vs df.loc[…] applies on columns vs index, when you use labels.

If you pass a boolean Series (or other iterable) for boolean indexing, then they both act on the index level. To perform boolean indexing on columns, you need df.loc[:, …]

Example:

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})

# select "col1" in the columns
df['col1']

# select "0" in the index
df.loc[0]


# boolean indexing on the index
df[df['col1'].ge(2)]
# or
df.loc[df['col1'].ge(2)]
# or
df[[False, True, True]]
# or
df.loc[[False, True, True]]


# boolean indexing on the columns
df.loc[:, df.loc[0].ge(2)]
# or
df.loc[:, [False, True]]

huangapple
  • 本文由 发表于 2023年6月8日 20:35:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76431919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定