如何在golang中根据字符进行分割,但只在引号外部进行分割?

huangapple go评论144阅读模式
英文:

How to split on a character only if it's outside of quotes in golang?

问题

我需要在文本中的+符号处进行分割,但只在它不在单引号内时进行分割。文本的格式大致如下:

Some.data:'some+value'+some.more.data:9+yet.more.data:'rock+roll'

这应该被分割成三个部分:

  • Some.data:'some+value'
  • some.more.data:9
  • yet.more.data:'rock+roll'

我在类似的问题中找到了使用正则表达式的解决方法,但这需要使用向前查找,而golang的正则表达式引擎不支持。

我也尝试了创建自己的正则表达式,不使用向前查找:

'.*?'(\+)|[^']*(\+)

但在第三个部分中,它似乎无法正确分割'rock+roll'中的+符号。

我考虑过可能先使用+进行字符串分割,然后验证每个部分以确保它不是部分表达式,如果是的话再将这些部分拼接在一起,但这将会相当复杂,如果可能的话,我想避免这种方法。

目前,我认为最好的解决方案是识别引号内的文本(我可以很容易地使用正则表达式实现),然后对该文本进行URL编码或进行其他处理以处理加号符号,然后再进行文本分割,最后再进行URL解码以恢复引号内的+符号,但我想知道是否有更好的方法。

有人知道如何使用正则表达式在不使用向前查找的情况下分割不在引号内的+符号吗?有人能想到比我的URL编码/解码方法更简单的解决方案吗?

英文:

I need to split a chunk of text on the + symbol, but only when it's outside of single quotes. The text will look something like:

Some.data:'some+value'+some.more.data:9+yet.more.data:'rock+roll'

which should become a slice of three values:

  • Some.data:'some+value'
  • some.more.data:9
  • yet.more.data:'rock+roll'

I've found similar questions that do it using regex, but that requires look ahead which the golang regex engine doesn't have.

I also took a crack at creating my own regex without lookahead:

'.*?'(\+)|[^']*(\+)

But that seems to fall apart on the third item where it splits on the + in 'rock+roll'.

I've thought about potentially doing a string split on + and then validating each slice to make sure it's not a partial expression and then stitching the pieces back together if it is, but it will be fairly involved and i'd like to avoid it if possible.

At the moment I think the best solution would be to identify text that is inside of quotes (which I can easily do with regex), either URL encode that text or do something else with the plus sign, split the text and then URL decode the expression to get the + sign inside of quotes back, but i'm wondering if there is a better way.

Does anyone know of a way to split on a + sign that is outside of quotes using regex without lookahead? Can anyone think of a simpler solution than my URL encoding/decoding method?

答案1

得分: 4

简单的代码可以更容易理解:

func split(s string) []string {
    var result []string
    inquote := false
    i := 0
    for j, c := range s {
        if c == '\'' {
            inquote = !inquote
        } else if c == '+' && !inquote {
            result = append(result, s[i:j])
            i = j + 1
        }
    }
    return append(result, s[i:])
}
英文:

Plain code can be easier:

func split(s string) []string {
	var result []string
	inquote := false
	i := 0
	for j, c := range s {
		if c == '\'' {
			inquote = !inquote
		} else if c == '+' && !inquote {
			result = append(result, s[i:j])
			i = j +1
		}
	}
	return append(result, s[i:])
}

答案2

得分: 0

稍微有点旧的问题,但你可以考虑使用go-andiamo/splitter

示例:

package main

import (
    "github.com/go-andiamo/splitter"
)

func main() {
    plusSplitter := splitter.MustCreateSplitter('+', splitter.SingleQuotes)

    s := "Some.data:'some+value'+some.more.data:9+yet.more.data:'rock+roll'"

    if parts, err := plusSplitter.Split(s); err == nil {
        for _, part := range parts {
            println(part)
        }
    } else {
        println(err.Error())
    }
}

在 go-playground 上尝试

声明:我是 go-andiamo/splitter 的作者。

英文:

Slightly old question, but you might consider using go-andiamo/splitter

Example:

package main

import (
    "github.com/go-andiamo/splitter"
)

func main() {
    plusSplitter := splitter.MustCreateSplitter('+', splitter.SingleQuotes)

    s := "Some.data:'some+value'+some.more.data:9+yet.more.data:'rock+roll'"

    if parts, err := plusSplitter.Split(s); err == nil {
        for _, part := range parts {
            println(part)
        }
    } else {
        println(err.Error())
    }
}

try on go-playground

Disclosure: I am the author of go-andiamo/splitter

huangapple
  • 本文由 发表于 2021年9月29日 00:12:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/69365135.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定