如何从跨越2行的正则表达式结果中打印Python3。

huangapple go评论65阅读模式
英文:

How to print a Ptython3 from a regex result that expand over 2 lines

问题

以下是已翻译的代码部分:

import re
pattern = re.compile(r"(==========)([\r\n]+.*)")
count = 0
for line in open('bookmarks.txt', encoding="utf-8"):
    for match in re.finditer(pattern, line):
        count += 1
        print(line)
print("The amount of notes are: ", count)

请注意,我已经更正了代码中的HTML实体,以便它可以正常运行。

英文:

I have a small python 3 script that reads a file where all the bookmarks are stored. My regex works in notepad++.

my regex is:

(==========)([\r\n]+.*)

My text file

==========
Book1 (Author 1)
- bookmark

text
==========
Book2 (Author 2)
- bookmark1

text
==========
Book1 (Author 1)
- bookmark2

text
==========
Book2 (Author 2)
- bookmark2

text
==========

My Python script is as follows:

import re
pattern = re.compile("(==========)([\r\n])(.*)")
count=0
for line in open(r'bookmarks.txt', encoding="utf-8"):
    for match in re.finditer(pattern, line):
        count=count+1
        print(line)
print("The amount of notes are: ",count)

The problem with this is the printed lines are only showing the "==========" part and excluding the:

==========
Book1 (Author 1)

I have tried different ways but none of them are showing what i'm looking for, any hint?

Thanks

答案1

得分: 1

你正在逐行搜索,因此在没有==========的行上无法匹配您的模式。您可以尝试像以下这样做:

with open(r'bookmarks.txt', encoding="utf-8") as file:
    bookmarks = file.read()
pattern = re.compile("(==========)(\r\n)([^\n]+)")
count = 0
for match in pattern.finditer(bookmarks):
    count += 1
    print(match[0])
print("笔记数量为:", count)

将整个bookmarks.txt文件读入一个字符串,然后开始搜索。不太清楚您想要检索书签的哪些部分,所以我将它限制在第一行。

结果如下:

==========
Book1 (Author 1)
==========
Book2 (Author 2)
==========
Book1 (Author 1)
==========
Book2 (Author 2)
笔记数量为: 4
英文:

You are searching line by line, so for lines without the ========== there's no way to match your pattern. You could try something like the following instead:

with open(r'bookmarks.txt', encoding="utf-8") as file:
    bookmarks = file.read()
pattern = re.compile("(==========)(\r\n)([^\n]+)")
count = 0
for match in pattern.finditer(bookmarks):
    count += 1
    print(match[0])
print("The amount of notes are: ", count)

Read the whole bookmarks.txt file into a string and then start searching. It's not exactly clear what parts of the bookmarks you want to retrieve, so I've limited it to the first line.

Result here:

==========
Book1 (Author 1)
==========
Book2 (Author 2)
==========
Book1 (Author 1)
==========
Book2 (Author 2)
The amount of notes are:  4

答案2

得分: 1

Your finditer-regex is applied to single lines of the input file only. Therefore, it cannot match the book-lines after "========". Why it finds anything at all? That's because you are allowing empty book-lines ((.*)).

It's not clear to me, what output you expect, but the following piece of code at least prints the separator line together with the book-line:

import re
pattern = re.compile(r"(==========)([\r\n]+)(.+)")
count=0
with open('bookmarks.txt', 'r', encoding='utf-8') as file:
    bookmarks = file.read()

for match in re.finditer(pattern, bookmarks):
    count=count+1
    print(match.group(0))
print("The amount of notes are: ",count)

Note, that I replaced (.*) by (.+).

英文:

Your finditer-regex is applied to single lines of the input file only. Therefore, it cannot match the book-lines after "========". Why it finds anything at all? That's because you are allowing empty book-lines ((.*)).

It's not clear to me, what output you expect, but the following piece of code at least prints the separator line together with the book-line:

import re
pattern = re.compile(r"(==========)([\r\n]+)(.+)")
count=0
with open('bookmarks.txt', 'r', encoding='utf-8') as file:
    bookmarks = file.read()

for match in re.finditer(pattern, bookmarks):
    count=count+1
    print(match.group(0))
print("The amount of notes are: ",count)

Note, that I replaced (.*) by (.+).

huangapple
  • 本文由 发表于 2023年2月6日 16:34:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75358971.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定