如何从regexp.ReplaceAllFunc中访问捕获组?

huangapple go评论85阅读模式
英文:

How to access a capturing group from regexp.ReplaceAllFunc?

问题

如何在ReplaceAllFunc()函数内部访问捕获组?

package main

import (
	"fmt"
	"regexp"
)

func main() {
	body := []byte("Visit this page: [PageName]")
	search := regexp.MustCompile("\\[([a-zA-Z]+)\\]")

	body = search.ReplaceAllFunc(body, func(s []byte) []byte {
		// 在这里如何访问捕获组?
	})
	
	fmt.Println(string(body))
}

目标是将[PageName]替换为<a href="/view/PageName">PageName</a>

这是Writing Web Applications Go教程底部的“其他任务”部分的最后一个任务。

英文:

How can I access a capture group from inside ReplaceAllFunc()?

package main

import (
    &quot;fmt&quot;
    &quot;regexp&quot;
)

func main() {
    body := []byte(&quot;Visit this page: [PageName]&quot;)
    search := regexp.MustCompile(&quot;\\[([a-zA-Z]+)\\]&quot;)

	body = search.ReplaceAllFunc(body, func(s []byte) []byte {
		// How can I access the capture group here?
	})
	
    fmt.Println(string(body))
}

The goal is to replace [PageName] with &lt;a href=&quot;/view/PageName&quot;&gt;PageName&lt;/a&gt;.

This is the last task under the "Other tasks" section at the bottom of the Writing Web Applications Go tutorial.

答案1

得分: 7

我同意,在函数内部访问捕获组将是理想的,但我不认为使用regexp.ReplaceAllFunc实现这一点是可能的。

我现在能想到的关于如何使用该函数实现这一点的方法是:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	body := []byte("Visit this page: [PageName] [OtherPageName]")
	search := regexp.MustCompile("\\[[a-zA-Z]+\\]")
	body = search.ReplaceAllFunc(body, func(s []byte) []byte {
		m := string(s[1 : len(s)-1])
		return []byte("<a href=\"/view/" + m + "\">" + m + "</a>")
	})
	fmt.Println(string(body))
}

编辑

我知道另一种实现你想要的功能的方法。首先,你需要知道可以使用(?:re)的语法来指定非捕获组,其中re是你的正则表达式。这不是必需的,但可以减少不感兴趣的匹配次数。

接下来要知道的是regexp.FindAllSubmatcheIndex。它将返回一个切片,其中每个内部切片表示给定正则表达式匹配的所有子匹配的范围。

有了这两个知识,你可以构建一个相对通用的解决方案:

package main

import (
	"fmt"
	"regexp"
)

func ReplaceAllSubmatchFunc(re *regexp.Regexp, b []byte, f func(s []byte) []byte) []byte {
	idxs := re.FindAllSubmatchIndex(b, -1)
	if len(idxs) == 0 {
		return b
	}
	l := len(idxs)
	ret := append([]byte{}, b[:idxs[0][0]]...)
	for i, pair := range idxs {
		// 用用户提供的函数的结果替换内部子匹配
		ret = append(ret, f(b[pair[2]:pair[3]])...)
		if i+1 < l {
			ret = append(ret, b[pair[1]:idxs[i+1][0]]...)
		}
	}
	ret = append(ret, b[idxs[len(idxs)-1][1]:]...)
	return ret
}

func main() {
	body := []byte("Visit this page: [PageName] [OtherPageName][XYZ]     [XY]")
	search := regexp.MustCompile(`(?:\[)([a-zA-Z]+)(?:\])`)

	body = ReplaceAllSubmatchFunc(search, body, func(s []byte) []byte {
		m := string(s)
		return []byte("<a href=\"/view/" + m + "\">" + m + "</a>")
	})

	fmt.Println(string(body))
}
英文:

I agree that having access to capture group while inside of your function would be ideal, I don't think it's possible with regexp.ReplaceAllFunc.
Only thing that comes to my mind right now regard how to do this with that function is this:

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
)

func main() {
	body := []byte(&quot;Visit this page: [PageName] [OtherPageName]&quot;)
	search := regexp.MustCompile(&quot;\\[[a-zA-Z]+\\]&quot;)
	body = search.ReplaceAllFunc(body, func(s []byte) []byte {
		m := string(s[1 : len(s)-1])
		return []byte(&quot;&lt;a href=\&quot;/view/&quot; + m + &quot;\&quot;&gt;&quot; + m + &quot;&lt;/a&gt;&quot;)
	})
	fmt.Println(string(body))
}

EDIT

There is one other way I know how to do what you want. First thing you need to know is that you can specify non capturing group using syntax (?:re) where re is your regular expression. This is not essential, but will reduce number of not interesting matches.

Next thing to know is regexp.FindAllSubmatcheIndex. It will return slice of slices, where each internal slice represents ranges of all submatches for given matching of regexp.

Having this two things, you can construct somewhat generic solution:

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
)

func ReplaceAllSubmatchFunc(re *regexp.Regexp, b []byte, f func(s []byte) []byte) []byte {
	idxs := re.FindAllSubmatchIndex(b, -1)
	if len(idxs) == 0 {
		return b
	}
	l := len(idxs)
	ret := append([]byte{}, b[:idxs[0][0]]...)
	for i, pair := range idxs {
        // replace internal submatch with result of user supplied function
		ret = append(ret, f(b[pair[2]:pair[3]])...)
		if i+1 &lt; l {
			ret = append(ret, b[pair[1]:idxs[i+1][0]]...)
		}
	}
	ret = append(ret, b[idxs[len(idxs)-1][1]:]...)
	return ret
}

func main() {
	body := []byte(&quot;Visit this page: [PageName] [OtherPageName][XYZ]     [XY]&quot;)
	search := regexp.MustCompile(&quot;(?:\\[)([a-zA-Z]+)(?:\\])&quot;)

	body = ReplaceAllSubmatchFunc(search, body, func(s []byte) []byte {
		m := string(s)
		return []byte(&quot;&lt;a href=\&quot;/view/&quot; + m + &quot;\&quot;&gt;&quot; + m + &quot;&lt;/a&gt;&quot;)
	})

	fmt.Println(string(body))
}

答案2

得分: 3

如果你想在ReplaceAllFunc中获取子组,可以使用ReplaceAllString来获取子组。

package main

import (
    "fmt"
    "regexp"
)

func main() {
    body := []byte("Visit this page: [PageName]")
    search := regexp.MustCompile("\\[([a-zA-Z]+)\\]")

    body = search.ReplaceAllFunc(body, func(s []byte) []byte {
        // 如何在这里访问捕获组?
        group := search.ReplaceAllString(string(s), `$1`)

        fmt.Println(group)

        // 根据需要处理组
        newGroup := "<a href='/view/" + group + "'>" + group + "</a>"
        return []byte(newGroup)
    })

    fmt.Println(string(body))
}

当有多个组时,你可以通过这种方式获取每个组,然后处理每个组并返回所需的值。

英文:

If you want to get group in ReplaceAllFunc, you can use ReplaceAllString to get the subgroup.

package main

import (
    &quot;fmt&quot;
    &quot;regexp&quot;
)

func main() {
    body := []byte(&quot;Visit this page: [PageName]&quot;)
    search := regexp.MustCompile(&quot;\\[([a-zA-Z]+)\\]&quot;)

    body = search.ReplaceAllFunc(body, func(s []byte) []byte {
        // How can I access the capture group here?
        group := search.ReplaceAllString(string(s), `$1`)
	
        fmt.Println(group)
	
        // handle group as you wish
        newGroup := &quot;&lt;a href=&#39;/view/&quot; + group + &quot;&#39;&gt;&quot; + group + &quot;&lt;/a&gt;&quot;
        return []byte(newGroup)
    })

    fmt.Println(string(body))
}

And when there are many groups, you are able to get each group by this way, then handle each group and return desirable value.

答案3

得分: 0

你必须首先调用ReplaceAllFunc,然后在同一个正则表达式上再次调用FindStringSubmatch。像这样:

func (p parser) substituteEnvVars(data []byte) ([]byte, error) {
    var err error
    substituted := p.envVarPattern.ReplaceAllFunc(data, func(matched []byte) []byte {
        varName := p.envVarPattern.FindStringSubmatch(string(matched))[1]
        value := os.Getenv(varName)
        if len(value) == 0 {
            log.Printf("替换环境变量%s时发生致命错误\n", varName)
        }

        return []byte(value)
    });
    return substituted, err
}
英文:

You have to call ReplaceAllFunc first and within the function call FindStringSubmatch on the same regex again. Like:

func (p parser) substituteEnvVars(data []byte) ([]byte, error) {
	var err error
	substituted := p.envVarPattern.ReplaceAllFunc(data, func(matched []byte) []byte {
		varName := p.envVarPattern.FindStringSubmatch(string(matched))[1]
		value := os.Getenv(varName)
		if len(value) == 0 {
			log.Printf(&quot;Fatal error substituting environment variable %s\n&quot;, varName)
		}

		return []byte(value)
	});
	return substituted, err
}

huangapple
  • 本文由 发表于 2015年1月17日 23:12:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/28000832.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定