2023年4月20日 03:09:09go评论100阅读模式

英文:

Converting adoc to markdown while preserving latex style math equations

问题

我有一组adoc文档，我正在将它们转换为markdown格式。对于大多数文档，我已经成功地进行了转换：
``` sh
asciidoc -b docbook -o temp.xml &lt;infile&gt;
pandoc -f docbook -t markdown_strict --atx-headers --mathjax temp.xml -o &lt;outfile&gt;

然后，我使用一些正则表达式来修复一些破损的图片链接并修复标题。然而，对于内联数学公式，这种方法不起作用。在adoc中，它们的语法是：latexmath:[$some_equation_here$]，有时多行公式中没有美元符号。

当这些公式转换为DocBook XML时，它们被保留并且格式如下：

&lt;inlineequation&gt;
&lt;alt&gt;&lt;![CDATA[$some_equation_here$]]&gt;&lt;/alt&gt;
&lt;inlinemediaobject&gt;&lt;textobject&gt;&lt;phrase&gt;&lt;/phrase&gt;&lt;/textobject&gt;&lt;/inlinemediaobject&gt;
&lt;/inlineequation&gt;

但是当pandoc将其转换回markdown时，它忽略了这些xml块。在pandoc转换过程中，如何保持markdown可读的公式格式($some_equation_here$)呢？mathjax扩展似乎无法解决这个问题。

我尝试使用单独的Python正则表达式，使用 re.sub(r'latexmath:\[\$?(.*?)\$?\]', r'$\g<1>$', file_contents) 来保留$，但结果是一些双重转义的文本，然后必须手动修复，而且有时会产生额外的/sup标签。尝试在XML文件中进行类似操作也产生了类似的结果。


<details>
<summary>英文:</summary>
I have a group of adoc documents that I&#39;m converting to markdown. For most of them I&#39;ve been able to convert them with:
``` sh
asciidoc -b docbook -o temp.xml &lt;infile&gt;
pandoc -f docbook -t markdown_strict --atx-headers --mathjax temp.xml -o &lt;outfile&gt;

followed by some regex to clean up some broken image links and fix the headers. However, this doesn't work for the in-line math equations. In the adoc they are in the syntax: latexmath:[$some_equation_here$] sometimes without the dollar signs for multi-line equations.

when this gets turned into the DocBook XML it seems to be preserved and is of the format:

&lt;inlineequation&gt;
&lt;alt&gt;&lt;![CDATA[$some_equation_here$]]&gt;&lt;/alt&gt;
&lt;inlinemediaobject&gt;&lt;textobject&gt;&lt;phrase&gt;&lt;/phrase&gt;&lt;/textobject&gt;&lt;/inlinemediaobject&gt;
&lt;/inlineequation&gt;

but when pandoc converts it back to markdown it ignores these blocks of xml. How can i keep it in a markdown readable equation ($some_equation_here$) format during the pandoc conversion? The mathjax extension doesn't seem to be helping with this operation.

I tried to use a seperate python regex that would use re.sub(r'latexmath:\[\$?(.*?)\$?\]', r'$\g<1>$', file_contents to keep the $ but it results in some double escaped text that then has to go be fixed manually as well as not fully working sometimes giving some extra /sup tags. Trying to do something similar with the XML file resulted in similar results.

答案1

得分: 0

Here is the translated content:

根据 pandoc 代码，DocBook 阅读器期望公式位于 <inlineequation> 元素下的 <mathphrase> 元素中。因此，只需将 <alt> 标签替换为 <mathphrase> 即可使 pandoc 捕捉到公式。一般情况下，这会生成无效的 DocBook XML，因为 <inlineequation> 应该包含要么 <mathphrase> 要么 <inlinemediaobjects>，但这对 pandoc 并不重要。

注意，pandoc 会自己插入美元符号，所以这些也应该被移除。上述命令使用 Lua 过滤器来移除美元符号；unwrap-math.lua 包含以下内容：

function Math (mth)
  mth.text = mth.text:gsub('^%$', ''):gsub('%$$', '')
  return mth
end

英文:

Looking at the pandoc code it seems that the DocBook reader expects the formula to be in an <mathphrase> element below <inlineequation>. Thus, replacing the <alt> tags with <mathphrase> is enough to get the equation to be picked up by pandoc. This yields invalid DocBook XML in general, as the <inlineequation> should contain either a <mathphrase> or <inlinemediaobjects>, but that doesn't matter for pandoc.

cat &lt;&lt; EOF | pandoc --from=docbook --to markdown --lua-filter=unwrap-math.lua
&lt;para&gt;
  &lt;inlineequation&gt;
    &lt;mathphrase&gt;&lt;![CDATA[$some_equation_here$]]&gt;&lt;/mathphrase&gt;
    &lt;inlinemediaobject&gt;&lt;textobject&gt;&lt;phrase&gt;&lt;/phrase&gt;&lt;/textobject&gt;&lt;/inlinemediaobject&gt;
  &lt;/inlineequation&gt;
&lt;/para&gt;
EOF
$some_equation_here$

Note that pandoc inserts the dollars itself, so those should be removed as well. The above command uses a Lua filter to remove the dollars; unwrap-math.lua contains

function Math (mth)
  mth.text = mth.text:gsub(&#39;^%$&#39;, &#39;&#39;):gsub(&#39;%$$&#39;, &#39;&#39;)
  return mth
end

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将adoc转换为markdown，同时保留latex样式的数学公式。

问题

答案1

Arduino处理串行输入需要1-2秒，如何加快速度？

有没有办法通过多个数据项属性函数来激活Python guidata数据项？

使用Pandas、Python从每一行文本中选择特定字符串

You can plot [sin(nx)/sin(x)]^2 如何绘制？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。