2023年6月1日 18:13:22go评论73阅读模式

英文:

How can I replace matches in a Python regex with a modified version of the match?

问题

我编写了这段代码来搜索特定文件夹中的文本文件，查找单词匹配项并进行指定：

import re, os, sys
from pathlib import Path

# 用法：regs 目录
try:
    if len(sys.argv) == 2:
        folder = sys.argv[1]
        fList = os.listdir(folder)
        uInput = input('输入正则表达式: ')
        regObj = re.compile(f'''{uInput}''')
        wordReg = re.compile(r'''([A-Za-z0-9]+|\s+|[^\w\s]+)''')
        matches = []
        print(fList)

        for file in fList:
            if not os.path.isdir(Path(folder)/Path(file)):
                currentFileObj = open(f'{folder}/{file}')
                content = currentFileObj.readlines()
                currentFileObj.seek(0)
                text = currentFileObj.read()
                words = wordReg.findall(text)
                matches = list(filter(regObj.match, words))
                instances = 0
                print(f"匹配项 ({file}):\n'", end='')
                for word in words:
                    if word in matches:
                        print("\u0333".join(f"{word} "), end='')
                    else:
                        print(word, end='')
                print("'")
                for line in content:
                    matches = regObj.findall(line)
                    for match in matches:
                        print("\u0333".join(f"{match} "), end=' ')
                        print(f"in line number {content.index(line)+1}")
                        if match != '':
                            instances = instances + 1
                print(f'找到的实例数: {instances}\n')
            else:
                continue
    else:
        print('用法: regs 目录')
except FileNotFoundError:
    print("该文件不存在.")
except PermissionError:
    print("您没有权限搜索该文件夹.")

它在大多数情况下都有效，除了一些正则表达式。如果正则表达式在其他字符旁边有标点符号或空格字符，它将不会下划线。如果我找到一种方法来用匹配项的修改版本替换匹配项（将匹配项替换为下划线版本），可能会起作用。有人知道解决方法吗？

这是对于任何其他正则表达式的样子。

您可以看到在第一个文本文件中它不会下划线匹配项（out.）

我尝试查找可以将匹配项替换为修改后的匹配项的函数，但似乎没有找到。还有一些小问题，它无法正确地下划线空格和标点符号，并且下划线字符不会在Windows 7命令提示符中显示，也许除了下划线以外的其他字符可以工作？

英文:

I wrote this code to search a specific folder's text files for word matches and to specify them:

import re, os, sys
from pathlib import Path
#Usage: regs directory
try:
if len(sys.argv) == 2:
folder = sys.argv[1]
fList = os.listdir(folder)
uInput = input(&#39;input a regex: &#39;)
regObj = re.compile(f&#39;&#39;&#39;{uInput}&#39;&#39;&#39;)
wordReg = re.compile(r&#39;&#39;&#39;([A-Za-z0-9]+|\s+|[^\w\s]+)&#39;&#39;&#39;)
matches = []
print(fList)
for file in fList:
if not os.path.isdir(Path(folder)/Path(file)):
currentFileObj = open(f&#39;{folder}/{file}&#39;)
content = currentFileObj.readlines()
currentFileObj.seek(0)
text = currentFileObj.read()
words = wordReg.findall(text)
matches = list(filter(regObj.match, words))
instances = 0
print(f&quot;matches in ({file}):\n&#39;&quot;, end=&#39;&#39;)
for word in words:
if word in matches:
print(&quot;\u0333&quot;.join(f&quot;{word} &quot;), end=&#39;&#39;)
else:
print(word, end=&#39;&#39;)
print(&quot;&#39;&quot;)
for line in content:
matches = regObj.findall(line)
for match in matches:
print(&quot;\u0333&quot;.join(f&quot;{match} &quot;), end=&#39; &#39;)
print(f&quot;in line number {content.index(line)+1}&quot;)
if match != &#39;&#39;:
instances = instances + 1
print(f&#39;number of instances found: {instances}\n&#39;)
else:
continue
else:
print(&#39;Usage: regs directory&#39;)
except FileNotFoundError:
print(&quot;that file doesn&#39;t exist.&quot;)
except PermissionError:
print(&quot;you don&#39;t have permission to search that folder.&quot;)

it works for the most part except for a few regular expressions, if the regular expression has punctuation or a white space character next to other characters it wouldn't underline it, it may work if i find out a way to substitute matches with a modified version of the match (replacing the match with an underlined version)
Anyone knows a fix ?

here's what it looks like for any other regex.

you can see in the first text file it doesn't underline the match (out.)

i tried looking for functions that would substitute matches with a modification of said match, doesn't appear like there's any ?

also there's the minor problems of it not being able to underline whitespaces and punctuation properly, and the underline character doesn't appear in the windows7 command prompt, maybe a different character other than the underline can work ?

答案1

得分: 0

如果您的目标是在代码中强调匹配项，您可以修改打印逻辑，使用\u0332将匹配项替换为下划线版本，如下所示：

underlined_match = "\u0332".join(f"{match}\u0332")
print(underlined_match, end=' ')

如果您的目标是更改正则表达式以捕获普通字符（a-z0-9）之间的标点符号和空白字符，则以下正则表达式可能会对您有所帮助：

(?:[A-Za-z0-9]+(?:[^\w\s]*[A-Za-z0-9]+[^\w\s]*)*)|(?:[^\w\s]+)

英文:

If your goal is to underline the matches in your code, you can modify the printing logic to replace the matches with an underlined version by using \u0332 so

underlined_match = &quot;\u0332&quot;.join(f&quot;{match}\u0332&quot;)
print(underlined_match, end=&#39; &#39;)

Else if your goal is to change the regex so that it captures punctuation marks and blanks between normal characters (a-z0-9) then this regex might help you

(?:[A-Za-z0-9]+(?:[^\w\s]*[A-Za-z0-9]+[^\w\s]*)*)|(?:[^\w\s]+)

答案2

得分: 0

我已经找到答案：
使用lambda函数作为re.sub的repl=变量，我能够修改匹配项，然后使用它们进行替换。

import re, os, sys
from pathlib import Path

# 用法：regs 目录
try:
    if len(sys.argv) == 2:
        folder = sys.argv[1]
        fList = os.listdir(folder)
        print("文件夹内容：", end=' ')
        for f in fList:
            if not f == fList[-1]:
                print(f, end=', ')
            else:
                print(f, end='.\n\n')
        uInput = input('输入正则表达式: ')
        print()
        regObj = re.compile(f'''{uInput}''')
        wordReg = re.compile(r'''([A-Za-z0-9]+|\s+|[^\w\s]+)''')
        matches = []
        
        for file in fList:
            if os.path.isfile(Path(folder)/Path(file)):
                currentFileObj = open(f'{folder}/{file}')
                lines = currentFileObj.readlines()
                currentFileObj.seek(0)
                text = currentFileObj.read()
                words = wordReg.findall(text)
                matches = list(filter(regObj.match, words))
                instances = 0
                print(f"在 ({file}) 中的匹配项:\n'", end='')
                print(regObj.sub(lambda match: "(" + match.group() + ")", text)+"'")
                for line in lines:
                    matches = regObj.findall(line)
                    for match in matches:
                        print(f"({match})", end=' ')
                        print(f"在第 {lines.index(line)+1} 行")
                        if match != '':
                            instances = instances + 1
                print(f'找到的实例数量: {instances}\n')
            else:
                continue
    else:
        print('用法: regs 目录')
except FileNotFoundError:
    print("该文件不存在。")
except PermissionError:
    print("您没有权限搜索该文件夹。")

而不是遍历字符串单词列表的循环，它只会打印括号之间的匹配组，如下所示：

print(regObj.sub(lambda match: "(" + match.group() + ")", text)+"'")

现在的输出看起来是这样的。

现在它还会打印文件夹的内容。

英文:

I've figured out the answer:
using a lambda function as a repl= variable with re.sub i was capable of modifying the matches and then using them to substitute.

    import re, os, sys
from pathlib import Path
#Usage:regs directory
try:
if len(sys.argv) == 2:
folder = sys.argv[1]
fList = os.listdir(folder)
print(&quot;folder contents: &quot;, end=&#39; &#39;)
for f in fList:
if not f == fList[-1]:
print(f, end=&#39;, &#39;)
else:
print(f, end=&#39;.\n\n&#39;)
uInput = input(&#39;input a regex: &#39;)
print()
regObj = re.compile(f&#39;&#39;&#39;{uInput}&#39;&#39;&#39;)
wordReg = re.compile(r&#39;&#39;&#39;([A-Za-z0-9]+|\s+|[^\w\s]+)&#39;&#39;&#39;)
matches = []
for file in fList:
if os.path.isfile(Path(folder)/Path(file)):
currentFileObj = open(f&#39;{folder}/{file}&#39;)
lines = currentFileObj.readlines()
currentFileObj.seek(0)
text = currentFileObj.read()
words = wordReg.findall(text)
matches = list(filter(regObj.match, words))
instances = 0
print(f&quot;matches in ({file}):\n&#39;&quot;, end=&#39;&#39;)
print(regObj.sub(lambda match: &quot;(&quot; + match.group() + &quot;)&quot;, text)+&quot;&#39;&quot;)
for line in lines:
matches = regObj.findall(line)
for match in matches:
print((f&quot;({match})&quot;), end=&#39; &#39;)
print(f&quot;in line number {lines.index(line)+1}&quot;)
if match != &#39;&#39;:
instances = instances + 1
print(f&#39;number of instances found: {instances}\n&#39;)
else:
continue
else:
print(&#39;Usage:regs directory&#39;)
except FileNotFoundError:
print(&quot;that file doesn&#39;t exist.&quot;)
except PermissionError:
print(&quot;you don&#39;t have permission to search that folder.&quot;)

instead of having a loop that goes over the list of the string's words, it just prints the match group between parenthesis like so:

            print(regObj.sub(lambda match: &quot;(&quot; + match.group() + &quot;)&quot;, text)+&quot;&#39;&quot;)

The output now looks like this.

it also prints the folder contents now.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你可以使用Python正则表达式将匹配项替换为匹配项的修改版本。

问题

答案1

答案2

facebook_business.exceptions.FacebookBadObjectError: Bad data to set object data error when trying to pull advertiser data

Z3 遍历字符串以累加数字（Python API）

在matplotlib中更改月份标签，而不更改区域设置

从字符串中移除特殊字符和标点符号

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论