2023年6月8日 20:35:06go评论112阅读模式

英文:

Using df.loc[] vs df[] shorthand with boolean masks, pandas

问题

df[booleanMask] 和 df.loc[booleanMask] 都对我有效，但我不明白为什么。我认为不使用 .loc 的 df[] 简写应该应用于列，而我试图应用于行，所以我认为我需要使用 .loc。

以下是具体的代码：

# 布尔运算符
# 所有球队至少进4球并零封对手的比赛
hw_4_0 = (pl23['FTHG'] >= 4) & (pl23['FTAG'] == 0)
aw_0_4 = (pl23['FTHG'] == 0) & (pl23['FTAG'] >= 4)
pl23.loc[aw_0_4 | hw_4_0]

例如，pl23.loc[aw_0_4 | hw_4_0, :] 也有效，但 pl23.loc[:, aw_0_4 | hw_4_0] 不行。我认为 df[布尔掩码] 是后者的简写（与索引一样），那么为什么在这种情况下有效呢？

使用了 pl23.loc[aw_0_4 | hw_4_0]，返回了查询设计的数据框，而我预期会出现 IndexingError: Unalignable boolean Series provided as indexer（所提供的布尔 Series 的索引与被索引的对象的索引不匹配）。

英文:

Both df[booleanMask] and df.loc[booleanMask] are working for me but I don't understand why. The shorthand df[] without using .loc I thought applied to the column whereas I am trying to apply to the row, so I thought I needed to use .loc

Here is the specific code:

# Boolean operators
# All the games where a team scored at least 4 goals and won to nil
hw_4_0 = (pl23[&#39;FTHG&#39;] &gt;= 4) &amp; (pl23[&#39;FTAG&#39;] == 0)
aw_0_4 = (pl23[&#39;FTHG&#39;] == 0) &amp; (pl23[&#39;FTAG&#39;] &gt;= 4)
pl23.loc[aw_0_4 | hw_4_0]

For example, pl23.loc[aw_0_4 | hw_4_0, :] also works, but pl23.loc[:, aw_0_4 | hw_4_0] doesn't. I thought that df[boolean mask] was short hand for the latter (as with indexing), so why does it work in this instance?

Used pl23.loc[aw_0_4 | hw_4_0] which returned the data frame the query was designed for, whereas I was expecting IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

答案1

得分: 1

df[…] 与 df.loc[…] 适用于列与索引，当你使用标签时。

如果你传递一个布尔 Series（或其他可迭代对象）用于布尔索引，那么它们都作用于索引级别。要在列上执行布尔索引，你需要使用 df.loc[:, …]

示例：

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
# 选择列 "col1"
df['col1']
# 选择索引 "0"
df.loc[0]
# 在索引上进行布尔索引
df[df['col1'].ge(2)]
# 或者
df.loc[df['col1'].ge(2)]
# 或者
df[[False, True, True]]
# 或者
df.loc[[False, True, True]]
# 在列上进行布尔索引
df.loc[:, df.loc[0].ge(2)]
# 或者
df.loc[:, [False, True]]

英文:

df[…] vs df.loc[…] applies on columns vs index, when you use labels.

If you pass a boolean Series (or other iterable) for boolean indexing, then they both act on the index level. To perform boolean indexing on columns, you need df.loc[:, …]

Example:

df = pd.DataFrame({&#39;col1&#39;: [1, 2, 3], &#39;col2&#39;: [4, 5, 6]})
# select &quot;col1&quot; in the columns
df[&#39;col1&#39;]
# select &quot;0&quot; in the index
df.loc[0]
# boolean indexing on the index
df[df[&#39;col1&#39;].ge(2)]
# or
df.loc[df[&#39;col1&#39;].ge(2)]
# or
df[[False, True, True]]
# or
df.loc[[False, True, True]]
# boolean indexing on the columns
df.loc[:, df.loc[0].ge(2)]
# or
df.loc[:, [False, True]]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用 df.loc[] 与使用布尔蒙版的 df[] 简写，pandas

问题

答案1

Python和pandas：批处理数据，其中时间戳之间的差值小于设定的值

如何修复psycopg2中的语法错误，出现在'%'附近？

函数返回错误类型的值，如果我使用多进程。

Azure QueueClient Python 无法使用 DefaultAzureCredential

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。