2023年5月30日 04:06:53go评论115阅读模式

英文:

How do I compare two columns in a Pandas DataFrame and output values from other columns based on the match?

问题

所以我有一个pandas数据框。

A: 3, 4, 1, 2, 1,
B: 1, 2, 3, 4
C: 红色, 蓝色, 黄色, 绿色
D: 是, 否, 可能, 真

我想要能够按顺序检查列A与列B，如果有匹配项，然后输出与B相关的C和D。

例如，上面的数据框将转换为
A: 黄色, 绿色, 红色, 蓝色, 红色
B: 可能, 真, 是, 否, 是

我对Python和pandas相当新，所以我可能遗漏了一些简单的东西，但我想不出解决这个问题的方法，也不确定从哪里开始寻找答案。任何帮助都将不胜感激。

-Smoggs

许多各种各样的想法。我知道我应该使用iloc来定位单元格中的值，但我不确定如何在其旁边输出结果。我真的不知所措。

英文:

So I have a pandas dataframe.

A: 3, 4, 1, 2, 1,
B 1, 2, 3, 4
C Red, Blue, Yellow, Green
D Yes, No, Maybe, True

what I want to be able to do is sequentially check column A against column B, if there is a match then output C and D relative to B.

For example the data frame above would be converted into
A Yellow, Green, Red, Blue, Red
B Maybe, True, Yes, No, Yes

I am rather new to python and pandas so I might be missing something simple here but I cannot think of solutions for this problem and am unsure where to start to find an answer. Any help would be appreciated

-Smoggs

Many various ideas. I know I should be using iloc to target the value in the cell, but I am unsure how to output the result next to it. I'm really at a loss here

答案1

得分: 0

首先，如果您需要根据另一个列进行索引（比如这里是根据 C 和 D 列来索引 B 列），您可以使用 B 列上的 set_index 方法。

然后，您可以使用 A 列的索引来获取 C 和 D 列的数据：

print(df.loc[df["A"]][["C", "D"]])
# 输出结果：
#         C      D
# B               
# 3  Yellow  Maybe
# 4   Green   True
# 1     Red    Yes
# 2    Blue     No

请注意，我们在这里使用的是 loc 而不是 iloc。它们之间的区别在于，loc 将使用数据框的索引（即使使用字符串也可以），而 iloc 将使用底层数据数组中的位置（有关更多详细信息，请参阅这里的文档）。

英文:

First, if you need to index columns according to another (like here B relative to C and D), you can use the set_index method on B.

df = pd.DataFrame({
    &quot;A&quot;: [3, 4, 1, 2],
    &quot;B&quot;: [1, 2, 3, 4],
    &quot;C&quot;: [&quot;Red&quot;, &quot;Blue&quot;, &quot;Yellow&quot;, &quot;Green&quot;],
    &quot;D&quot;: [&quot;Yes&quot;, &quot;No&quot;, &quot;Maybe&quot;, True]
})

df.set_index(&quot;B&quot;, inplace=True)

Note the inplace parameter which directly perform changes on df variable.

Then, you can get C and D using A indices:

print(df.loc[df[&quot;A&quot;]][[&quot;C&quot;, &quot;D&quot;]])
# output:
#         C      D
# B               
# 3  Yellow  Maybe
# 4   Green   True
# 1     Red    Yes
# 2    Blue     No

Note that we use loc and not iloc here. The difference is that loc will use the dataframe index (even if string are used) whereas iloc will use position in the underlying data array (see the doc here for more details).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

比较 Pandas DataFrame 中的两列并根据匹配输出其他列的值。

问题

答案1

在Golang中，UUID4的整数表示方法是什么？

`.remove()`在for循环中不按预期工作。

从数据框逐行或按块选择最大/最小值

Airflow任务 XComArg 结果未找到。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论