英文:
Regular expression for capturing all text starting at one pattern and ending at another
问题
I have translated the text you provided. Here is the translated content:
我正在使用Python从PDF中抓取文本数据。有一个包含我需要的数据的常见模式,以数字模式开头,以字符串模式结尾。我需要使用正则表达式捕获所有文本,包括这些模式。
我有一个正则表达式,当我将数据从PDF转换为文本并读取文本时有效。但是当我使用PyPDF2从PDF页面中提取文本时,正则表达式失效。
数据流如下:
提交日期:8/21/2022\n录入日期:10/21/2022\n解雇日期:01/23/2023\n关闭日期:01/30/2023\n17-55018-\nQRTbk 7 Windows PC\n操作系统:xxx\n角色:AdminHubertson
起始点是17-55018-
字符串,我有一个有效的正则表达式:
[0-9]{2}-[0-9]{5}-
```
结束点是`角色:Admin`,足够唯一以识别。
我尝试了多种捕获方法,包括使用先行断言来获取我需要的文本。我已在regex101上测试了这些方法,它们有效,但我无法使它们在实际代码中工作。
一些我尝试过的模式:
```
[0-9]{2}-[0-9]{5}-\s(\n(?!Role)(.*))*Role: Admin
[0-9]{2}-[0-9]{5}-\.(.*?)Role: Admin
[0-9]{2}-[0-9]{5}-.*(?=Role).*Role: Admin
```
希望这可以帮助你解决问题。
<details>
<summary>英文:</summary>
I am scraping text data off a pdf using python. There is a common pattern that contains the data I need that begins with a numerical pattern and ends with a string pattern. I need to capture all the text, including the patterns using a regular expression.
I have a regular expression that works when I import the data by going pdf to txt and reading the text in. When I use PyPDF2 to extract the text from the pdf pages, the regular expression fails.
The data stream looks like this
```
Filed: 8/21/2022\nEntered: 10/21/2022\nDischarged: 01/23/2023\nClosed: 01/30/2023\n17-55018- \nQRTbk 7 Windows PC\n OS:xxx\nRole: AdminHubertson
```
The start point is the `17-55018-` string which I have a regex that works:
```
[0-9]{2}-[0-9]{5}-
```
The end point is the `Role: Admin` which is unique enough to compile.
I have tried a number of capture methods using lookaheads to get the text I need. These methods I have tested on regex101 and they work but I cannot get them to work
Some patterns I have tried:
```
[0-9]{2}-[0-9]{5}-\s(\n(?!Role)(.*))*Role: Admin
[0-9]{2}-[0-9]{5}-\.(.*?)Role: Admin
[0-9]{2}-[0-9]{5}-.*(?=Role).*Role: Admin
```
</details>
# 答案1
**得分**: 0
尝试这个:
\d{2}\-\d{5}.*?Role:\sAdmin
<details>
<summary>英文:</summary>
Try this one:
\d{2}\-\d{5}.*?Role:\sAdmin
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论