2023年1月9日 19:39:27go评论103阅读模式

英文:

Highlight a row in a pandas df if that row also appears in another df

问题

我有两个数据框 df1 和 df2。我想要将 df1 中与 df2 中相同的行都标记为黄色。

以下是我的代码来着色这些行：

import pandas as pd
df1 = pd.DataFrame([['AA', 3, 'hgend', 1], ['BB', 'frdf', 7, 2], ['C1', 4, 'asef', 4], ['C2', 4, 'asef', 4], ['C3', 4, 'asef', 4]], columns=list('ABCD'))
df2 = pd.DataFrame([['C1', 4, 'asef', 4], ['C2', 4, 'asef', 4], ['C3', 4, 'asef', 4]], columns=list('XYZQ'))
def highlight_rows(row):
    if row.isin(df2.to_dict(orient='list')).any().any():
        color = 'yellow'
    else:
        color = ''
    return [f'background-color: {color}' for _ in row]
styled_df1 = df1.style.apply(highlight_rows, axis=1)

这段代码将会标记 df1 中与 df2 中相同的行为黄色。希望这对你有所帮助。

英文:

I have two dataframes df1 and df2. I would like to highlight in yellow all the rows in df1 that are also present in df2.

df1

df2

What I want to achive

So far I have only found solutions in which I insert another row and use a variable there to identify which row I have to colour.

My question is whether it is possible to compare these two df directly in the function presented below.

So these are the two df's:

df1 = pd.DataFrame([[&#39;AA&#39;,3,&#39;hgend&#39;,1], [&#39;BB&#39;,&#39;frdf&#39;,7,2], [&#39;C1&#39;,4,&#39;asef&#39;,4], [&#39;C2&#39;,4,&#39;asef&#39;,4], [&#39;C3&#39;,4,&#39;asef&#39;,4]], columns=list(&quot;ABCD&quot;))
df2 = pd.DataFrame([[&#39;C1&#39;,4,&#39;asef&#39;,4], [&#39;C2&#39;,4,&#39;asef&#39;,4], [&#39;C3&#39;,4,&#39;asef&#39;,4]], columns=list(&quot;XYZQ&quot;))

This is my code to colour the rows:

def highlight_rows(row):
    value = row.loc[&#39;A&#39;]
    if value == &#39;C1&#39;:
        color = &#39;yellow&#39;
    else:
        color = &#39;&#39;
    return [&#39;background-color: {}&#39;.format(color) for r in row]
df1.style.apply(highlight_rows, axis=1)

As I said, if I do the comparison beforehand, insert another column and put a variable there, I can then search for this variable and highlight the row.
My question is whether I can also do this directly in the function. To do this, I would have to be able to compare both df's in the function. Is this possible at all? It would be enough to be able to compare a single row, e.g. with .isin

答案1

得分: 0

与函数内部比较df2将效率低下。

您可以定义一个临时列，使用合并操作来识别匹配项（指示符列in_1会根据行是否在df2中存在而变为left_only或both）。然后，样式不会考虑它：

def highlight_rows(row):
    highlight = 'yellow' if row['in_1'] == "both" else ""
    return ['background-color: {}'.format(highlight) for r in row]
(pd.merge(df1, df2.set_axis(df1.columns.tolist(), axis=1), 
          how="left", indicator="in_1")
    .style
    .hide_columns(['in_1'])
    .apply(highlight_rows, axis=1))

或者，要在函数内部执行实际的比较，可以预先定义一组df2行的元组：

set_df2 = set(df2.apply(tuple, axis=1))
def highlight_rows(row):
    color = 'yellow' if tuple(row) in set_df2 else ""
    return ['background-color: {}'.format(color)] * len(row)
df1.style.apply(highlight_rows, axis=1)

英文:

Comparing to df2 inside the function would be inefficient.

You could define a temporary column to identify matches using a merge (the indicator column in_1 becomes left_only or both depending on whether or not the row is present in df2). It is then ignored by the styler:

def highlight_rows(row):
    highlight = &#39;yellow&#39; if row[&#39;in_1&#39;] == &quot;both&quot; else &quot;&quot;
    return [&#39;background-color: {}&#39;.format(highlight) for r in row]
(pd.merge(df1, df2.set_axis(df1.columns.tolist(), axis=1), 
          how=&quot;left&quot;, indicator=&quot;in_1&quot;)
    .style
    .hide_columns([&#39;in_1&#39;])
    .apply(highlight_rows, axis=1))

Alternatively, to actually do the comparison inside the function, define a set of tuples of df2 rows beforehand:

set_df2 = set(df2.apply(tuple, axis=1))
def highlight_rows(row):
    color = &#39;yellow&#39; if tuple(row) in set_df2 else &quot;&quot;
    return [f&#39;background-color: {color}&#39;] * len(row)
df1.style.apply(highlight_rows, axis=1)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Highlight a row in a pandas df if that row also appears in another df

问题

答案1

将一个字典追加到另一个字典中

Python regex, one word with n characters followed by two words with one char

如何在数据框架中迭代因子水平并添加来自列表的值的列？

Formula and encoding issues when saving df to Excel.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。