英文:
Python - Highlight differences between two strings in Jupyter notebook
问题
我有一个包含两个字符串的列表,我想要突出显示并打印出这两个字符串之间的差异(特别是在Jupyter笔记本中)。通过差异,我具体指的是将一个字符串更改为另一个字符串所需的插入、删除和替换操作。
我找到了这个问题,它类似,但没有提到如何呈现这些变化。
英文:
I have a list of two strings, and I want to highlight and print differences between two strings (specifically in Jupyter notebook). By differences, I specifically mean the insertions, deletions and replacements needed to change one of the strings to the other.
I found this question which is similar but doesn't mention a way to present the changes.
答案1
得分: 2
我找到了一种有效的方法来显示这样的高亮,并希望与其他人分享。
difflib
模块提供了有效查找差异的工具,特别是 SequenceMatcher
类,而 IPython.display
模块则帮助您在笔记本设置中突出显示差异。
演示
首先,假设数据的格式如下:
cases = [
('afrykanerskojęzyczny', 'afrykanerskojęzycznym'),
('afrykanerskojęzyczni', 'nieafrykanerskojęzyczni'),
('afrykanerskojęzycznym', 'afrykanerskojęzyczny'),
('nieafrykanerskojęzyczni', 'afrykanerskojęzyczni'),
('nieafrynerskojęzyczni', 'afrykanerskojzyczni'),
('abcdefg','xac')
]
您可以创建一个函数,该函数提供了突出显示插入、删除和替换的 HTML 字符串,使用以下代码:
from difflib import SequenceMatcher
# 高亮颜色
# 您可以根据您的喜好更改这些值
color_delete = '#811612' # 删除的高亮颜色
color_insert = '#28862D' # 插入的高亮颜色
color_replace = '#BABA26' # 替换的高亮颜色
# 用于突出显示段的常用格式字符串
f_str = '<span style="background: {};">{}</span>'
# 给定两个字符串(a、b),getFormattedDiff 返回 HTML 格式化的字符串(formatted_a、formatted_b)
def getFormattedDiff(a, b):
# 初始化序列匹配器
s = SequenceMatcher(None, a, b)
# 用于格式化字符串的 stringbuilders
formatted_a = []
formatted_b = []
# 遍历所有字符块
for tag, i1, i2, j1, j2 in s.get_opcodes():
if tag == 'equal':
# 如果块相同,将块追加到两个字符串而不进行任何格式化
formatted_a.append(a[i1:i2])
formatted_b.append(b[j1:j2])
elif tag == 'delete':
# 如果这是一个删除块,将块追加到第一个字符串并使用删除高亮
formatted_a.append(f_str.format(color_delete, a[i1:i2]))
elif tag == 'insert':
# 如果这是一个插入块,将块追加到第二个字符串并使用插入高亮
formatted_b.append(f_str.format(color_insert, b[j1:j2]))
elif tag == 'replace':
# 如果这是一个替换块,将块追加到两个字符串并使用替换高亮
formatted_a.append(f_str.format(color_replace, a[i1:i2]))
formatted_b.append(f_str.format(color_replace, b[j1:j2]))
# 返回格式化的字符串
return ''.join(formatted_a), ''.join(formatted_b)
现在,我们运行上面定义的函数,对所有 cases
字符串进行循环,如下所示:
from IPython.display import HTML, display
# 遍历所有 cases 并显示带有高亮的两个字符串
for a, b in cases:
formatted_a, formatted_b = getFormattedDiff(a, b)
display(HTML(formatted_a))
display(HTML(formatted_b))
print()
我们将获得以下显示输出:
英文:
I figured out an effective way to display such highlighting and want to share it with others.
The difflib
module gives you the tools to effectively find the differences, specifically the SequenceMatcher
class, while the IPython.display
module helps you highlight the differences in a notebook setting.
Demonstration
First, let's assume the data in the following format:
cases = [
('afrykanerskojęzyczny', 'afrykanerskojęzycznym'),
('afrykanerskojęzyczni', 'nieafrykanerskojęzyczni'),
('afrykanerskojęzycznym', 'afrykanerskojęzyczny'),
('nieafrykanerskojęzyczni', 'afrykanerskojęzyczni'),
('nieafrynerskojęzyczni', 'afrykanerskojzyczni'),
('abcdefg','xac')
]
You can create a function that gives you the HTML string which highlights the insertions, deletions and replacements, using the following code:
from difflib import SequenceMatcher
# highlight colors
# you may change these values according to your preferences
color_delete = '#811612' # highlight color for deletions
color_insert = '#28862D' # highlight color for insertions
color_replace = '#BABA26' # highlight color for replacements
# the common format string used for highlighted segments
f_str = '<span style="background: {};">{}</span>'
# given two strings (a, b), getFormattedDiff returns the HTML formatted strings (formatted_a, formatted_b)
def getFormattedDiff(a, b):
# initialize the sequence matcher
s = SequenceMatcher(None, a, b)
# stringbuilders for the formatted strings
formatted_a = []
formatted_b = []
# iterate through all char blocks
for tag, i1, i2, j1, j2 in s.get_opcodes():
if tag == 'equal':
# if the blovks are the same, append block to both strings without any formatting
formatted_a.append(a[i1:i2])
formatted_b.append(b[j1:j2])
elif tag == 'delete':
# if this is a deletion block, append block to the first string with the delete highlight
formatted_a.append(f_str.format(color_delete, a[i1:i2]))
elif tag == 'insert':
# if this is a insertion block, append block to the second string with the insert highlight
formatted_b.append(f_str.format(color_insert, b[j1:j2]))
elif tag == 'replace':
# if this is a replacement block, append block to both strings with the replace highlight
formatted_a.append(f_str.format(color_replace, a[i1:i2]))
formatted_b.append(f_str.format(color_replace, b[j1:j2]))
# return the formatted strings
return ''.join(formatted_a), ''.join(formatted_b)
Now we run the above defined function in a loop for all the cases
strings like so:
from IPython.display import HTML, display
# iterate through all the cases and display both strings with the highlights
for a, b in cases:
formatted_a, formatted_b = getFormattedDiff(a, b)
display(HTML(formatted_a))
display(HTML(formatted_b))
print()
and we get the following display output:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论