英文:
How do I compare two columns in a Pandas DataFrame and output values from other columns based on the match?
问题
所以我有一个pandas数据框。
A: 3, 4, 1, 2, 1,
B: 1, 2, 3, 4
C: 红色, 蓝色, 黄色, 绿色
D: 是, 否, 可能, 真
我想要能够按顺序检查列A与列B,如果有匹配项,然后输出与B相关的C和D。
例如,上面的数据框将转换为
A: 黄色, 绿色, 红色, 蓝色, 红色
B: 可能, 真, 是, 否, 是
我对Python和pandas相当新,所以我可能遗漏了一些简单的东西,但我想不出解决这个问题的方法,也不确定从哪里开始寻找答案。任何帮助都将不胜感激。
-Smoggs
许多各种各样的想法。我知道我应该使用iloc来定位单元格中的值,但我不确定如何在其旁边输出结果。我真的不知所措。
英文:
So I have a pandas dataframe.
A: 3, 4, 1, 2, 1,
B 1, 2, 3, 4
C Red, Blue, Yellow, Green
D Yes, No, Maybe, True
what I want to be able to do is sequentially check column A against column B, if there is a match then output C and D relative to B.
For example the data frame above would be converted into
A Yellow, Green, Red, Blue, Red
B Maybe, True, Yes, No, Yes
I am rather new to python and pandas so I might be missing something simple here but I cannot think of solutions for this problem and am unsure where to start to find an answer. Any help would be appreciated
-Smoggs
Many various ideas. I know I should be using iloc to target the value in the cell, but I am unsure how to output the result next to it. I'm really at a loss here
答案1
得分: 0
首先,如果您需要根据另一个列进行索引(比如这里是根据 C 和 D 列来索引 B 列),您可以使用 B 列上的 set_index
方法。
然后,您可以使用 A 列的索引来获取 C 和 D 列的数据:
print(df.loc[df["A"]][["C", "D"]])
# 输出结果:
# C D
# B
# 3 Yellow Maybe
# 4 Green True
# 1 Red Yes
# 2 Blue No
请注意,我们在这里使用的是 loc
而不是 iloc
。它们之间的区别在于,loc
将使用数据框的索引(即使使用字符串也可以),而 iloc
将使用底层数据数组中的位置(有关更多详细信息,请参阅这里的文档)。
英文:
First, if you need to index columns according to another (like here B relative to C and D), you can use the set_index
method on B.
df = pd.DataFrame({
"A": [3, 4, 1, 2],
"B": [1, 2, 3, 4],
"C": ["Red", "Blue", "Yellow", "Green"],
"D": ["Yes", "No", "Maybe", True]
})
df.set_index("B", inplace=True)
Note the inplace
parameter which directly perform changes on df
variable.
Then, you can get C and D using A indices:
print(df.loc[df["A"]][["C", "D"]])
# output:
# C D
# B
# 3 Yellow Maybe
# 4 Green True
# 1 Red Yes
# 2 Blue No
Note that we use loc
and not iloc
here. The difference is that loc
will use the dataframe index (even if string are used) whereas iloc
will use position in the underlying data array (see the doc here for more details).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论