你可以使用Python正则表达式将匹配项替换为匹配项的修改版本。

huangapple go评论73阅读模式
英文:

How can I replace matches in a Python regex with a modified version of the match?

问题

我编写了这段代码来搜索特定文件夹中的文本文件,查找单词匹配项并进行指定:

import re, os, sys
from pathlib import Path

# 用法:regs 目录
try:
    if len(sys.argv) == 2:
        folder = sys.argv[1]
        fList = os.listdir(folder)
        uInput = input('输入正则表达式: ')
        regObj = re.compile(f'''{uInput}''')
        wordReg = re.compile(r'''([A-Za-z0-9]+|\s+|[^\w\s]+)''')
        matches = []
        print(fList)

        for file in fList:
            if not os.path.isdir(Path(folder)/Path(file)):
                currentFileObj = open(f'{folder}/{file}')
                content = currentFileObj.readlines()
                currentFileObj.seek(0)
                text = currentFileObj.read()
                words = wordReg.findall(text)
                matches = list(filter(regObj.match, words))
                instances = 0
                print(f"匹配项 ({file}):\n'", end='')
                for word in words:
                    if word in matches:
                        print("\u0333".join(f"{word} "), end='')
                    else:
                        print(word, end='')
                print("'")
                for line in content:
                    matches = regObj.findall(line)
                    for match in matches:
                        print("\u0333".join(f"{match} "), end=' ')
                        print(f"in line number {content.index(line)+1}")
                        if match != '':
                            instances = instances + 1
                print(f'找到的实例数: {instances}\n')
            else:
                continue
    else:
        print('用法: regs 目录')
except FileNotFoundError:
    print("该文件不存在.")
except PermissionError:
    print("您没有权限搜索该文件夹.")

它在大多数情况下都有效,除了一些正则表达式。如果正则表达式在其他字符旁边有标点符号或空格字符,它将不会下划线。如果我找到一种方法来用匹配项的修改版本替换匹配项(将匹配项替换为下划线版本),可能会起作用。有人知道解决方法吗?

这是对于任何其他正则表达式的样子。

您可以看到在第一个文本文件中它不会下划线匹配项(out.)

我尝试查找可以将匹配项替换为修改后的匹配项的函数,但似乎没有找到。还有一些小问题,它无法正确地下划线空格和标点符号,并且下划线字符不会在Windows 7命令提示符中显示,也许除了下划线以外的其他字符可以工作?

英文:

I wrote this code to search a specific folder's text files for word matches and to specify them:

import re, os, sys
from pathlib import Path
#Usage: regs directory
try:
if len(sys.argv) == 2:
folder = sys.argv[1]
fList = os.listdir(folder)
uInput = input('input a regex: ')
regObj = re.compile(f'''{uInput}''')
wordReg = re.compile(r'''([A-Za-z0-9]+|\s+|[^\w\s]+)''')
matches = []
print(fList)
for file in fList:
if not os.path.isdir(Path(folder)/Path(file)):
currentFileObj = open(f'{folder}/{file}')
content = currentFileObj.readlines()
currentFileObj.seek(0)
text = currentFileObj.read()
words = wordReg.findall(text)
matches = list(filter(regObj.match, words))
instances = 0
print(f"matches in ({file}):\n'", end='')
for word in words:
if word in matches:
print("\u0333".join(f"{word} "), end='')
else:
print(word, end='')
print("'")
for line in content:
matches = regObj.findall(line)
for match in matches:
print("\u0333".join(f"{match} "), end=' ')
print(f"in line number {content.index(line)+1}")
if match != '':
instances = instances + 1
print(f'number of instances found: {instances}\n')
else:
continue
else:
print('Usage: regs directory')
except FileNotFoundError:
print("that file doesn't exist.")
except PermissionError:
print("you don't have permission to search that folder.")

it works for the most part except for a few regular expressions, if the regular expression has punctuation or a white space character next to other characters it wouldn't underline it, it may work if i find out a way to substitute matches with a modified version of the match (replacing the match with an underlined version)
Anyone knows a fix ?

here's what it looks like for any other regex.

you can see in the first text file it doesn't underline the match (out.)

i tried looking for functions that would substitute matches with a modification of said match, doesn't appear like there's any ?

also there's the minor problems of it not being able to underline whitespaces and punctuation properly, and the underline character doesn't appear in the windows7 command prompt, maybe a different character other than the underline can work ?

答案1

得分: 0

如果您的目标是在代码中强调匹配项,您可以修改打印逻辑,使用\u0332将匹配项替换为下划线版本,如下所示:

underlined_match = "\u0332".join(f"{match}\u0332")
print(underlined_match, end=' ')

如果您的目标是更改正则表达式以捕获普通字符(a-z0-9)之间的标点符号和空白字符,则以下正则表达式可能会对您有所帮助:

(?:[A-Za-z0-9]+(?:[^\w\s]*[A-Za-z0-9]+[^\w\s]*)*)|(?:[^\w\s]+)
英文:

If your goal is to underline the matches in your code, you can modify the printing logic to replace the matches with an underlined version by using \u0332 so

underlined_match = "\u0332".join(f"{match}\u0332")
print(underlined_match, end=' ')

Else if your goal is to change the regex so that it captures punctuation marks and blanks between normal characters (a-z0-9) then this regex might help you

(?:[A-Za-z0-9]+(?:[^\w\s]*[A-Za-z0-9]+[^\w\s]*)*)|(?:[^\w\s]+)

答案2

得分: 0

我已经找到答案:
使用lambda函数作为re.subrepl=变量,我能够修改匹配项,然后使用它们进行替换。

import re, os, sys
from pathlib import Path

# 用法:regs 目录
try:
    if len(sys.argv) == 2:
        folder = sys.argv[1]
        fList = os.listdir(folder)
        print("文件夹内容:", end=' ')
        for f in fList:
            if not f == fList[-1]:
                print(f, end=', ')
            else:
                print(f, end='.\n\n')
        uInput = input('输入正则表达式: ')
        print()
        regObj = re.compile(f'''{uInput}''')
        wordReg = re.compile(r'''([A-Za-z0-9]+|\s+|[^\w\s]+)''')
        matches = []
        
        for file in fList:
            if os.path.isfile(Path(folder)/Path(file)):
                currentFileObj = open(f'{folder}/{file}')
                lines = currentFileObj.readlines()
                currentFileObj.seek(0)
                text = currentFileObj.read()
                words = wordReg.findall(text)
                matches = list(filter(regObj.match, words))
                instances = 0
                print(f"在 ({file}) 中的匹配项:\n'", end='')
                print(regObj.sub(lambda match: "(" + match.group() + ")", text)+"'")
                for line in lines:
                    matches = regObj.findall(line)
                    for match in matches:
                        print(f"({match})", end=' ')
                        print(f"在第 {lines.index(line)+1} 行")
                        if match != '':
                            instances = instances + 1
                print(f'找到的实例数量: {instances}\n')
            else:
                continue
    else:
        print('用法: regs 目录')
except FileNotFoundError:
    print("该文件不存在。")
except PermissionError:
    print("您没有权限搜索该文件夹。")

而不是遍历字符串单词列表的循环,它只会打印括号之间的匹配组,如下所示:

print(regObj.sub(lambda match: "(" + match.group() + ")", text)+"'")

现在的输出看起来是这样的。

现在它还会打印文件夹的内容。

英文:

I've figured out the answer:
using a lambda function as a repl= variable with re.sub i was capable of modifying the matches and then using them to substitute.

    import re, os, sys
from pathlib import Path
#Usage:regs directory
try:
if len(sys.argv) == 2:
folder = sys.argv[1]
fList = os.listdir(folder)
print("folder contents: ", end=' ')
for f in fList:
if not f == fList[-1]:
print(f, end=', ')
else:
print(f, end='.\n\n')
uInput = input('input a regex: ')
print()
regObj = re.compile(f'''{uInput}''')
wordReg = re.compile(r'''([A-Za-z0-9]+|\s+|[^\w\s]+)''')
matches = []
for file in fList:
if os.path.isfile(Path(folder)/Path(file)):
currentFileObj = open(f'{folder}/{file}')
lines = currentFileObj.readlines()
currentFileObj.seek(0)
text = currentFileObj.read()
words = wordReg.findall(text)
matches = list(filter(regObj.match, words))
instances = 0
print(f"matches in ({file}):\n'", end='')
print(regObj.sub(lambda match: "(" + match.group() + ")", text)+"'")
for line in lines:
matches = regObj.findall(line)
for match in matches:
print((f"({match})"), end=' ')
print(f"in line number {lines.index(line)+1}")
if match != '':
instances = instances + 1
print(f'number of instances found: {instances}\n')
else:
continue
else:
print('Usage:regs directory')
except FileNotFoundError:
print("that file doesn't exist.")
except PermissionError:
print("you don't have permission to search that folder.")

instead of having a loop that goes over the list of the string's words, it just prints the match group between parenthesis like so:

            print(regObj.sub(lambda match: "(" + match.group() + ")", text)+"'")

The output now looks like this.

it also prints the folder contents now.

huangapple
  • 本文由 发表于 2023年6月1日 18:13:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76380856.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定