英文:
Highlight a row in a pandas df if that row also appears in another df
问题
我有两个数据框 df1 和 df2。我想要将 df1 中与 df2 中相同的行都标记为黄色。
以下是我的代码来着色这些行:
import pandas as pd
df1 = pd.DataFrame([['AA', 3, 'hgend', 1], ['BB', 'frdf', 7, 2], ['C1', 4, 'asef', 4], ['C2', 4, 'asef', 4], ['C3', 4, 'asef', 4]], columns=list('ABCD'))
df2 = pd.DataFrame([['C1', 4, 'asef', 4], ['C2', 4, 'asef', 4], ['C3', 4, 'asef', 4]], columns=list('XYZQ'))
def highlight_rows(row):
if row.isin(df2.to_dict(orient='list')).any().any():
color = 'yellow'
else:
color = ''
return [f'background-color: {color}' for _ in row]
styled_df1 = df1.style.apply(highlight_rows, axis=1)
这段代码将会标记 df1 中与 df2 中相同的行为黄色。希望这对你有所帮助。
英文:
I have two dataframes df1 and df2. I would like to highlight in yellow all the rows in df1 that are also present in df2.
So far I have only found solutions in which I insert another row and use a variable there to identify which row I have to colour.
My question is whether it is possible to compare these two df directly in the function presented below.
So these are the two df's:
df1 = pd.DataFrame([['AA',3,'hgend',1], ['BB','frdf',7,2], ['C1',4,'asef',4], ['C2',4,'asef',4], ['C3',4,'asef',4]], columns=list("ABCD"))
df2 = pd.DataFrame([['C1',4,'asef',4], ['C2',4,'asef',4], ['C3',4,'asef',4]], columns=list("XYZQ"))
This is my code to colour the rows:
def highlight_rows(row):
value = row.loc['A']
if value == 'C1':
color = 'yellow'
else:
color = ''
return ['background-color: {}'.format(color) for r in row]
df1.style.apply(highlight_rows, axis=1)
As I said, if I do the comparison beforehand, insert another column and put a variable there, I can then search for this variable and highlight the row.
My question is whether I can also do this directly in the function. To do this, I would have to be able to compare both df's in the function. Is this possible at all? It would be enough to be able to compare a single row, e.g. with .isin
答案1
得分: 0
与函数内部比较df2
将效率低下。
您可以定义一个临时列,使用合并操作来识别匹配项(指示符列in_1
会根据行是否在df2
中存在而变为left_only
或both
)。然后,样式不会考虑它:
def highlight_rows(row):
highlight = 'yellow' if row['in_1'] == "both" else ""
return ['background-color: {}'.format(highlight) for r in row]
(pd.merge(df1, df2.set_axis(df1.columns.tolist(), axis=1),
how="left", indicator="in_1")
.style
.hide_columns(['in_1'])
.apply(highlight_rows, axis=1))
或者,要在函数内部执行实际的比较,可以预先定义一组df2
行的元组:
set_df2 = set(df2.apply(tuple, axis=1))
def highlight_rows(row):
color = 'yellow' if tuple(row) in set_df2 else ""
return ['background-color: {}'.format(color)] * len(row)
df1.style.apply(highlight_rows, axis=1)
英文:
Comparing to df2
inside the function would be inefficient.
You could define a temporary column to identify matches using a merge (the indicator column in_1
becomes left_only
or both
depending on whether or not the row is present in df2
). It is then ignored by the styler:
def highlight_rows(row):
highlight = 'yellow' if row['in_1'] == "both" else ""
return ['background-color: {}'.format(highlight) for r in row]
(pd.merge(df1, df2.set_axis(df1.columns.tolist(), axis=1),
how="left", indicator="in_1")
.style
.hide_columns(['in_1'])
.apply(highlight_rows, axis=1))
Alternatively, to actually do the comparison inside the function, define a set of tuples of df2
rows beforehand:
set_df2 = set(df2.apply(tuple, axis=1))
def highlight_rows(row):
color = 'yellow' if tuple(row) in set_df2 else ""
return [f'background-color: {color}'] * len(row)
df1.style.apply(highlight_rows, axis=1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论