2023年5月17日 16:18:16go评论92阅读模式

英文:

Issues creating a regex to extract code from Markdown

问题

我试图从Markdown字符串中提取代码，离成功很近了。我的代码是：

import re
string = &quot;&quot;&quot;
Lorem ipsum
```python
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)

Lorem ipsum
"""

pattern = r'(?:\w+\n)?(.*?)(?!.*)'
result = re.search(pattern, string, re.DOTALL).group(1)
print(result)


这个代码的结果是：

print('foobarfoo')
print('foobarfoo')
`


你会注意到，我唯一的问题是代码块末尾多了一个额外的反引号。我无法确定是什么匹配到了它，或者如何移除它，但我确信与我使用的负向预查有关。
<details>
<summary>英文:</summary>
I&#39;m trying to extract code from a string of Markdown and I&#39;m very close. My code is:

import re

string = """
Lorem ipsum

print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)

Lorem ipsum
"""

pattern = r'(?:\w+\n)?(.*?)(?!.*)'
result = re.search(pattern, string, re.DOTALL).group(1)
print(result)


And the result of this is:
```
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)
`
```
You&#39;ll notice the only problem I have is the extra backtick at the end of that code block. I can&#39;t figure out what&#39;s matching that or how to remove it but I&#39;m certain it has something to do with the negative lookahead I&#39;m using. 
</details>
# 答案1
**得分**: 1
以下是您要翻译的内容：
第一个不匹配&lt;code&gt;.*\`\`\`&lt;/code&gt;（因此终止匹配）的字符是&lt;code&gt;行后的&lt;code&gt;\`&lt;/code&gt;。例如，查看[此演示][1]。请注意，当存在多个代码块时，此方法根本不起作用。
可能最安全的方法是依赖&lt;code&gt;\`\`\`&lt;/code&gt;是行首的第一件事。然后，您可以匹配到下一个以&lt;code&gt;\`\`\`&lt;/code&gt;开头的行：
```regex
^```(?:\w+)?\s*\n(.*?)(?=^```)```
```
[在regex101上查看演示][2]
在python中：
````python
import re
string = &quot;&quot;&quot;
Lorem ipsum
```python
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)
```python
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)

"""

pattern = r'^(?:\w+)?\s*\n(.*?)(?=^)```'
result = re.findall(pattern, string, re.DOTALL | re.MULTILINE)
print(*[r for r in result], sep='\n')


输出：

print('foobarfoo')
print('foobarfoo')

  [1]: https://regex101.com/r/UMHjzn/1
  [2]: https://regex101.com/r/8iN6FJ/1
<details>
<summary>英文:</summary>
The first character which doesn&#39;t match &lt;code&gt;.*\`\`\`&lt;/code&gt; (and hence terminates the match) is the &lt;code&gt;\`&lt;/code&gt; at the start of the line after &lt;code&gt;print(&#39;foo\`\`\`bar\`\`\`foo&#39;)&lt;/code&gt;. See for example [this demo][1]. You&#39;ll note that this method doesn&#39;t work at all when there is more than one code block.
Probably the safest approach is to rely on the &lt;code&gt;\`\`\`&lt;/code&gt; being the first thing on the line. Then you can match up to the next occurrence of &lt;code&gt;\`\`\`&lt;/code&gt; at the start of a line instead:
```regex
^```(?:\w+)?\s*\n(.*?)(?=^```)```
```
[Demo on regex101][2]
In python:
````python
import re
string = &quot;&quot;&quot;
Lorem ipsum
```python
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)
```
Lorem ipsum
```python
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)
```
Lorem ipsum
```
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)
```
&quot;&quot;&quot;
pattern = r&#39;^```(?:\w+)?\s*\n(.*?)(?=^```)```&#39;
result = re.findall(pattern, string, re.DOTALL | re.MULTILINE)
print(*[r for r in result], sep=&#39;\n&#39;)

Output:

print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)
print(&#39;foo```bar```foo&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个正则表达式来从Markdown中提取代码的问题

问题

如何缩放 histplot 数据

如何使用索引更新字典

groupby datetime64[ns]列的均值

验证字符串是否仅包含数字

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。