Highlight a row in a pandas df if that row also appears in another df

huangapple go评论81阅读模式
英文:

Highlight a row in a pandas df if that row also appears in another df

问题

我有两个数据框 df1 和 df2。我想要将 df1 中与 df2 中相同的行都标记为黄色。

以下是我的代码来着色这些行:

import pandas as pd

df1 = pd.DataFrame([['AA', 3, 'hgend', 1], ['BB', 'frdf', 7, 2], ['C1', 4, 'asef', 4], ['C2', 4, 'asef', 4], ['C3', 4, 'asef', 4]], columns=list('ABCD'))
df2 = pd.DataFrame([['C1', 4, 'asef', 4], ['C2', 4, 'asef', 4], ['C3', 4, 'asef', 4]], columns=list('XYZQ'))

def highlight_rows(row):
    if row.isin(df2.to_dict(orient='list')).any().any():
        color = 'yellow'
    else:
        color = ''
    return [f'background-color: {color}' for _ in row]

styled_df1 = df1.style.apply(highlight_rows, axis=1)

这段代码将会标记 df1 中与 df2 中相同的行为黄色。希望这对你有所帮助。

英文:

I have two dataframes df1 and df2. I would like to highlight in yellow all the rows in df1 that are also present in df2.

df1

df2

What I want to achive

So far I have only found solutions in which I insert another row and use a variable there to identify which row I have to colour.

My question is whether it is possible to compare these two df directly in the function presented below.

So these are the two df's:

df1 = pd.DataFrame([['AA',3,'hgend',1], ['BB','frdf',7,2], ['C1',4,'asef',4], ['C2',4,'asef',4], ['C3',4,'asef',4]], columns=list("ABCD"))
df2 = pd.DataFrame([['C1',4,'asef',4], ['C2',4,'asef',4], ['C3',4,'asef',4]], columns=list("XYZQ"))

This is my code to colour the rows:

def highlight_rows(row):
    value = row.loc['A']
    if value == 'C1':
        color = 'yellow'
    else:
        color = ''
    return ['background-color: {}'.format(color) for r in row]

df1.style.apply(highlight_rows, axis=1)

As I said, if I do the comparison beforehand, insert another column and put a variable there, I can then search for this variable and highlight the row.
My question is whether I can also do this directly in the function. To do this, I would have to be able to compare both df's in the function. Is this possible at all? It would be enough to be able to compare a single row, e.g. with .isin

答案1

得分: 0

与函数内部比较df2将效率低下。

您可以定义一个临时列,使用合并操作来识别匹配项(指示符列in_1会根据行是否在df2中存在而变为left_onlyboth)。然后,样式不会考虑它:

def highlight_rows(row):
    highlight = 'yellow' if row['in_1'] == "both" else ""
    return ['background-color: {}'.format(highlight) for r in row]

(pd.merge(df1, df2.set_axis(df1.columns.tolist(), axis=1), 
          how="left", indicator="in_1")
    .style
    .hide_columns(['in_1'])
    .apply(highlight_rows, axis=1))

或者,要在函数内部执行实际的比较,可以预先定义一组df2行的元组:

set_df2 = set(df2.apply(tuple, axis=1))

def highlight_rows(row):
    color = 'yellow' if tuple(row) in set_df2 else ""
    return ['background-color: {}'.format(color)] * len(row)

df1.style.apply(highlight_rows, axis=1)

Highlight a row in a pandas df if that row also appears in another df

英文:

Comparing to df2 inside the function would be inefficient.

You could define a temporary column to identify matches using a merge (the indicator column in_1 becomes left_only or both depending on whether or not the row is present in df2). It is then ignored by the styler:

def highlight_rows(row):
    highlight = 'yellow' if row['in_1'] == "both" else ""
    return ['background-color: {}'.format(highlight) for r in row]

(pd.merge(df1, df2.set_axis(df1.columns.tolist(), axis=1), 
          how="left", indicator="in_1")
    .style
    .hide_columns(['in_1'])
    .apply(highlight_rows, axis=1))

Highlight a row in a pandas df if that row also appears in another df


Alternatively, to actually do the comparison inside the function, define a set of tuples of df2 rows beforehand:

set_df2 = set(df2.apply(tuple, axis=1))

def highlight_rows(row):
    color = 'yellow' if tuple(row) in set_df2 else ""
    return [f'background-color: {color}'] * len(row)

df1.style.apply(highlight_rows, axis=1)

huangapple
  • 本文由 发表于 2023年1月9日 19:39:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75056743.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定