如何使用正则表达式和Golang替换可选组?

huangapple go评论73阅读模式
英文:

How to replace optional group with regex and Golang

问题

我正在尝试将以下内容进行翻译:

{% img <right> /images/testing %}

翻译为:

{{< figure <class="right"> src="/images/testing" >}}

使用Golang中的正则表达式。源字符串中的<>部分是可选的。

我有以下代码,在主测试用例中似乎可以工作,当第一个捕获组存在时("right"):

regexp.MustCompile(`{%\s*img\s*(\p{L}*)\s+([/\S]+)\s+%}`)
.ReplaceAllString(`{% img right /images/testing %}`, `{{< figure class=\"$1\" src=\"$2\" >}}`)

然而,如果可选组缺失,我会得到:

{{< figure class="" src="/images/testing" >}}

这不是我想要的结果 - 我希望整个class=""部分被删除,就像这样:

{{< figure src="/images/testing" >}}

这种情况是否可能?我是否可以在替换字符串中指示:

{{< figure class=\"$1\" src=\"$2\" >}}

如果可选组为空,我希望额外的文本("class=")被删除?

英文:

I am trying to translate this:

{% img <right> /images/testing %}

into this:

{{< figure <class="right"> src="/images/testing" >}}

with regex in Golang. The part in <> in the source string is optional.

I have this code, which seems to work in the main test case, when the first capturing group exists ("right"):

regexp.MustCompile(`{%\s*img\s*(\p{L}*)\s+([/\S]+)\s+%}`)
.ReplaceAllString("{% img right /images/testing %}", "{{< figure class=\"$1\" src=\"$2\" >}}")

If the optional group is missing, however, I get:

{{< figure class="" src="/images/testing" >}}

which is not what I need - I want the entire class="" section gone, like this:

{{< figure src="/images/testing" >}}

Is this possible? Can I indicate somehow in the replacing string:

{{< figure class=\"$1\" src=\"$2\" >}}

that I want the additional text ("class=") gone if the optional group is empty?

答案1

得分: 1

Go正则表达式不支持条件语句,Replace系列的正则表达式函数也不支持。解决方案取决于你有多少特殊情况。

如果只有一个特殊情况,我建议进行两次替换:首先将所有出现的情况替换为带有属性集的形式,然后替换所有没有属性的情况(参考代码1):

txt := `{% img right /images/testing %}\n{% img /images/testing %}`

// 没有属性
txt = regexp.MustCompile(`{%\s*img\s*([/\S]+)\s+%}`).
  ReplaceAllString(txt, "{{< figure src=\"$1\" >}}")

// 带有属性
txt = regexp.MustCompile(`{%\s*img\s*(\p{L}*)\s+([/\S]+)\s+%}`).
  ReplaceAllString(txt, "{{< figure class=\"$1\" src=\"$2\" >}}")

如果你认为这样效率低下,我同意:可能是的。如果你想要更高效的解决方案(即不需要两次迭代源字符串),那么你需要构建一个更类似于解析器的东西,在检测到匹配时决定使用哪种格式。一个简单的示例大致如下(参考代码2):

src := []byte("ok" + "{% img right /images/testing %}" + "this" + 
              "{% img /images/testing %}" + "no?")
dst := bytes.NewBufferString("")
cidx := 0

for _, match := range p.FindAllSubmatchIndex(src, -1) {
	dst.Write(src[cidx:match[0]])
	dst.WriteString(newFormat(src, src[match[2]:match[3]], src[match[4]:match[5]]))
	cidx = match[1]
}
dst.Write(src[cidx:])

在这个示例中,你将源文本src的所有内容复制到缓冲区dst中,将模式的每个匹配替换为一个函数值的输出。该函数可以决定是否包含特定的格式化。

英文:

Go regexp do not support conditional statements and the Replace family of regexp functions doesn't either.
The solution to this depends on the number of special cases you have.

If you only have the one case I'd suggest to just do a two pass replacement: First replace all occurences with the attribute set, then replace all the cases without the attribute (on play):

txt := `{% img right /images/testing %}\n{% img /images/testing %}`

// without attribute
txt = regexp.MustCompile(`{%\s*img\s*([/\S]+)\s+%}`).
  ReplaceAllString(txt, &quot;{{&lt; figure src=\&quot;$1\&quot; &gt;}}&quot;)

// with attribute
txt = regexp.MustCompile(`{%\s*img\s*(\p{L}*)\s+([/\S]+)\s+%}`).
  ReplaceAllString(txt, &quot;{{&lt; figure class=\&quot;$1\&quot; src=\&quot;$2\&quot; &gt;}}&quot;)

If you say this is inefficient I say: probably, yes. If you want something more efficient (i.e. something that does not iterate the source string twice) then you have to build something more akin to a parser which decides at the time of detection which format to use. A rough sketch of this would be something like this (on play):

src := []byte(&quot;ok&quot; + &quot;{% img right /images/testing %}&quot; + &quot;this&quot; + 
              &quot;{% img /images/testing %}&quot; + &quot;no?&quot;)
dst := bytes.NewBufferString(&quot;&quot;)
cidx := 0

for _, match := range p.FindAllSubmatchIndex(src, -1) {
	dst.Write(src[cidx:match[0]])
	dst.WriteString(newFormat(src, src[match[2]:match[3]], src[match[4]:match[5]]))
	cidx = match[1]
}
dst.Write(src[cidx:])

In this example you copy everything from your source text src to a buffer dst, replacing every occurrence of your pattern with the output of the value of a function. This function can then decide to include specific formatting or not.

huangapple
  • 本文由 发表于 2017年4月2日 20:29:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/43168329.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定