英文:
Issues creating a regex to extract code from Markdown
问题
我试图从Markdown字符串中提取代码,离成功很近了。我的代码是:
import re
string = """
Lorem ipsum
```python
print('foo```bar```foo')
print('foo```bar```foo')
Lorem ipsum
"""
pattern = r'(?:\w+\n)?(.*?)(?!.*
)'
result = re.search(pattern, string, re.DOTALL).group(1)
print(result)
这个代码的结果是:
print('foobar
foo')
print('foobar
foo')
`
你会注意到,我唯一的问题是代码块末尾多了一个额外的反引号。我无法确定是什么匹配到了它,或者如何移除它,但我确信与我使用的负向预查有关。
<details>
<summary>英文:</summary>
I'm trying to extract code from a string of Markdown and I'm very close. My code is:
import re
string = """
Lorem ipsum
print('foo```bar```foo')
print('foo```bar```foo')
Lorem ipsum
"""
pattern = r'(?:\w+\n)?(.*?)(?!.*
)'
result = re.search(pattern, string, re.DOTALL).group(1)
print(result)
And the result of this is:
```
print('foo```bar```foo')
print('foo```bar```foo')
`
```
You'll notice the only problem I have is the extra backtick at the end of that code block. I can't figure out what's matching that or how to remove it but I'm certain it has something to do with the negative lookahead I'm using.
</details>
# 答案1
**得分**: 1
以下是您要翻译的内容:
第一个不匹配<code>.*\`\`\`</code>(因此终止匹配)的字符是<code>行后的<code>\`</code>。例如,查看[此演示][1]。请注意,当存在多个代码块时,此方法根本不起作用。
可能最安全的方法是依赖<code>\`\`\`</code>是行首的第一件事。然后,您可以匹配到下一个以<code>\`\`\`</code>开头的行:
```regex
^```(?:\w+)?\s*\n(.*?)(?=^```)```
```
[在regex101上查看演示][2]
在python中:
````python
import re
string = """
Lorem ipsum
```python
print('foo```bar```foo')
print('foo```bar```foo')
```python
print('foo```bar```foo')
print('foo```bar```foo')
"""
pattern = r'^(?:\w+)?\s*\n(.*?)(?=^
)```'
result = re.findall(pattern, string, re.DOTALL | re.MULTILINE)
print(*[r for r in result], sep='\n')
输出:
print('foobar
foo')
print('foobar
foo')
print('foobar
foo')
print('foobar
foo')
print('foobar
foo')
print('foobar
foo')
[1]: https://regex101.com/r/UMHjzn/1
[2]: https://regex101.com/r/8iN6FJ/1
<details>
<summary>英文:</summary>
The first character which doesn't match <code>.*\`\`\`</code> (and hence terminates the match) is the <code>\`</code> at the start of the line after <code>print('foo\`\`\`bar\`\`\`foo')</code>. See for example [this demo][1]. You'll note that this method doesn't work at all when there is more than one code block.
Probably the safest approach is to rely on the <code>\`\`\`</code> being the first thing on the line. Then you can match up to the next occurrence of <code>\`\`\`</code> at the start of a line instead:
```regex
^```(?:\w+)?\s*\n(.*?)(?=^```)```
```
[Demo on regex101][2]
In python:
````python
import re
string = """
Lorem ipsum
```python
print('foo```bar```foo')
print('foo```bar```foo')
```
Lorem ipsum
```python
print('foo```bar```foo')
print('foo```bar```foo')
```
Lorem ipsum
```
print('foo```bar```foo')
print('foo```bar```foo')
```
"""
pattern = r'^```(?:\w+)?\s*\n(.*?)(?=^```)```'
result = re.findall(pattern, string, re.DOTALL | re.MULTILINE)
print(*[r for r in result], sep='\n')
Output:
print('foo```bar```foo')
print('foo```bar```foo')
print('foo```bar```foo')
print('foo```bar```foo')
print('foo```bar```foo')
print('foo```bar```foo')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论