如何从跨越2行的正则表达式结果中打印Python3。

huangapple go评论83阅读模式
英文:

How to print a Ptython3 from a regex result that expand over 2 lines

问题

以下是已翻译的代码部分:

  1. import re
  2. pattern = re.compile(r"(==========)([\r\n]+.*)")
  3. count = 0
  4. for line in open('bookmarks.txt', encoding="utf-8"):
  5. for match in re.finditer(pattern, line):
  6. count += 1
  7. print(line)
  8. print("The amount of notes are: ", count)

请注意,我已经更正了代码中的HTML实体,以便它可以正常运行。

英文:

I have a small python 3 script that reads a file where all the bookmarks are stored. My regex works in notepad++.

my regex is:

  1. (==========)([\r\n]+.*)

My text file

  1. ==========
  2. Book1 (Author 1)
  3. - bookmark
  4. text
  5. ==========
  6. Book2 (Author 2)
  7. - bookmark1
  8. text
  9. ==========
  10. Book1 (Author 1)
  11. - bookmark2
  12. text
  13. ==========
  14. Book2 (Author 2)
  15. - bookmark2
  16. text
  17. ==========

My Python script is as follows:

  1. import re
  2. pattern = re.compile("(==========)([\r\n])(.*)")
  3. count=0
  4. for line in open(r'bookmarks.txt', encoding="utf-8"):
  5. for match in re.finditer(pattern, line):
  6. count=count+1
  7. print(line)
  8. print("The amount of notes are: ",count)

The problem with this is the printed lines are only showing the "==========" part and excluding the:

  1. ==========
  2. Book1 (Author 1)

I have tried different ways but none of them are showing what i'm looking for, any hint?

Thanks

答案1

得分: 1

你正在逐行搜索,因此在没有==========的行上无法匹配您的模式。您可以尝试像以下这样做:

  1. with open(r'bookmarks.txt', encoding="utf-8") as file:
  2. bookmarks = file.read()
  3. pattern = re.compile("(==========)(\r\n)([^\n]+)")
  4. count = 0
  5. for match in pattern.finditer(bookmarks):
  6. count += 1
  7. print(match[0])
  8. print("笔记数量为:", count)

将整个bookmarks.txt文件读入一个字符串,然后开始搜索。不太清楚您想要检索书签的哪些部分,所以我将它限制在第一行。

结果如下:

  1. ==========
  2. Book1 (Author 1)
  3. ==========
  4. Book2 (Author 2)
  5. ==========
  6. Book1 (Author 1)
  7. ==========
  8. Book2 (Author 2)
  9. 笔记数量为: 4
英文:

You are searching line by line, so for lines without the ========== there's no way to match your pattern. You could try something like the following instead:

  1. with open(r'bookmarks.txt', encoding="utf-8") as file:
  2. bookmarks = file.read()
  3. pattern = re.compile("(==========)(\r\n)([^\n]+)")
  4. count = 0
  5. for match in pattern.finditer(bookmarks):
  6. count += 1
  7. print(match[0])
  8. print("The amount of notes are: ", count)

Read the whole bookmarks.txt file into a string and then start searching. It's not exactly clear what parts of the bookmarks you want to retrieve, so I've limited it to the first line.

Result here:

  1. ==========
  2. Book1 (Author 1)
  3. ==========
  4. Book2 (Author 2)
  5. ==========
  6. Book1 (Author 1)
  7. ==========
  8. Book2 (Author 2)
  9. The amount of notes are: 4

答案2

得分: 1

Your finditer-regex is applied to single lines of the input file only. Therefore, it cannot match the book-lines after "========". Why it finds anything at all? That's because you are allowing empty book-lines ((.*)).

It's not clear to me, what output you expect, but the following piece of code at least prints the separator line together with the book-line:

  1. import re
  2. pattern = re.compile(r"(==========)([\r\n]+)(.+)")
  3. count=0
  4. with open('bookmarks.txt', 'r', encoding='utf-8') as file:
  5. bookmarks = file.read()
  6. for match in re.finditer(pattern, bookmarks):
  7. count=count+1
  8. print(match.group(0))
  9. print("The amount of notes are: ",count)

Note, that I replaced (.*) by (.+).

英文:

Your finditer-regex is applied to single lines of the input file only. Therefore, it cannot match the book-lines after "========". Why it finds anything at all? That's because you are allowing empty book-lines ((.*)).

It's not clear to me, what output you expect, but the following piece of code at least prints the separator line together with the book-line:

  1. import re
  2. pattern = re.compile(r"(==========)([\r\n]+)(.+)")
  3. count=0
  4. with open('bookmarks.txt', 'r', encoding='utf-8') as file:
  5. bookmarks = file.read()
  6. for match in re.finditer(pattern, bookmarks):
  7. count=count+1
  8. print(match.group(0))
  9. print("The amount of notes are: ",count)

Note, that I replaced (.*) by (.+).

huangapple
  • 本文由 发表于 2023年2月6日 16:34:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75358971.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定