在Golang中按长度拆分字符串

huangapple go评论84阅读模式
英文:

Split string by length in Golang

问题

有人知道如何在Golang中按长度拆分字符串吗?

例如,将“helloworld”按每3个字符拆分,理想情况下应该返回一个数组,包含“hel”、“low”、“orl”和“d”。

或者,另一种可能的解决方案是在每3个字符后添加一个换行符。

非常感谢任何想法!

英文:

Does anyone know how to split a string in Golang by length?

For example to split "helloworld" after every 3 characters, so it should ideally return an array of "hel" "low" "orl" "d"?

Alternatively a possible solution would be to also append a newline after every 3 characters..

All ideas are greatly appreciated!

答案1

得分: 24

确保将你的字符串转换为rune切片:参见"将字符串切片为字母"。

for循环会自动将字符串转换为rune,因此在这种情况下不需要额外的代码将字符串转换为rune。

for i, r := range s {
    fmt.Printf("i%d r %c\n", i, r)
    // 每3个i,做一些操作
}

r[n:n+3]在a是rune切片的情况下效果最好。

索引会每个rune增加一个,而在字符串切片中,每个字节可能会导致索引增加多于一个:例如"世界":i将是0和3:一个字符(rune)可能由多个字节组成。


例如,考虑s := "世a界世bcd界efg世":12个runes。(参见kbd

如果你尝试逐字节解析它,你会错过(在一个简单的每3个字符的实现中)一些"索引模3"(等于2、5、8和11),因为索引会超过这些值:

for i, r := range s {
    res = res + string(r)
    fmt.Printf("i %d r %c\n", i, r)
    if i > 0 && (i+1)%3 == 0 {
        fmt.Printf("=>(%d) '%v'\n", i, res)
        res = ""
    }
}

输出:

i  0 r 世
i  3 r a   <== 错过 i==2
i  4 r 界
i  7 r 世  <== 错过 i==5
i 10 r b  <== 错过 i==8
i 11 r c  ===============> 会打印 '世a界世bc',不完全是 '3个字符'!
i 12 r d
i 13 r 界
i 16 r e  <== 错过 i==14
i 17 r f  ===============> 会打印 'd界ef'
i 18 r g
i 19 r 世 <== 错过剩下的字符串

但是,如果你遍历runes(a := []rune(s)),你会得到你期望的结果,因为索引每次增加一个rune,这样很容易聚合恰好3个字符:

for i, r := range a {
    res = res + string(r)
    fmt.Printf("i%d r %c\n", i, r)
    if i > 0 && (i+1)%3 == 0 {
        fmt.Printf("=>(%d) '%v'\n", i, res)
        res = ""
    }
}

输出:

i 0 r 世
i 1 r a
i 2 r 界 ===============> 会打印 '世a界'
i 3 r 世
i 4 r b
i 5 r c ===============> 会打印 '世bc'
i 6 r d
i 7 r 界
i 8 r e ===============> 会打印 'd界e'
i 9 r f
i10 r g
i11 r 世 ===============> 会打印 'fg世'
英文:

Make sure to convert your string into a slice of rune: see "Slice string into letters".

for automatically converts string to rune so there is no additional code needed in this case to convert the string to rune first.

for i, r := range s {
    fmt.Printf(&quot;i%d r %c\n&quot;, i, r)
    // every 3 i, do something
}

r[n:n+3] will work best with a being a slice of rune.

The index will increase by one every rune, while it might increase by more than one for every byte in a slice of string: "世界": i would be 0 and 3: a character (rune) can be formed of multiple bytes.


For instance, consider s := &quot;世a界世bcd界efg世&quot;: 12 runes. (see <kbd>play.golang.org</kbd>)

If you try to parse it byte by byte, you will miss (in a naive split every 3 chars implementation) some of the "index modulo 3" (equals to 2, 5, 8 and 11), because the index will increase past those values:

for i, r := range s {
    res = res + string(r)
    fmt.Printf(&quot;i %d r %c\n&quot;, i, r)
    if i &gt; 0 &amp;&amp; (i+1)%3 == 0 {
        fmt.Printf(&quot;=&gt;(%d) &#39;%v&#39;\n&quot;, i, res)
        res = &quot;&quot;
    }
}

The output:

i  0 r 世
i  3 r a   &lt;== miss i==2
i  4 r 界
i  7 r 世  &lt;== miss i==5
i 10 r b  &lt;== miss i==8
i 11 r c  ===============&gt; would print &#39;世a界世bc&#39;, not exactly &#39;3 chars&#39;!
i 12 r d
i 13 r 界
i 16 r e  &lt;== miss i==14
i 17 r f  ===============&gt; would print &#39;d界ef&#39;
i 18 r g
i 19 r 世 &lt;== miss the rest of the string

But if you were to iterate on runes (a := []rune(s)), you would get what you expect, as the index would increase one rune at a time, making it easy to aggregate exactly 3 characters:

for i, r := range a {
    res = res + string(r)
    fmt.Printf(&quot;i%d r %c\n&quot;, i, r)
    if i &gt; 0 &amp;&amp; (i+1)%3 == 0 {
        fmt.Printf(&quot;=&gt;(%d) &#39;%v&#39;\n&quot;, i, res)
        res = &quot;&quot;
    }
}

Output:

i 0 r 世
i 1 r a
i 2 r 界 ===============&gt; would print &#39;世a界&#39;
i 3 r 世
i 4 r b
i 5 r c ===============&gt; would print &#39;世bc&#39;
i 6 r d
i 7 r 界
i 8 r e ===============&gt; would print &#39;d界e&#39;
i 9 r f
i10 r g
i11 r 世 ===============&gt; would print &#39;fg世&#39;

答案2

得分: 15

这是另一个变体的代码示例。从速度和内存两方面来看,它比其他答案都更高效。如果你想在这里运行基准测试,可以点击这里查看基准测试。总体而言,它比之前的版本快5倍,而之前的版本已经是最快的答案了。

请注意,在遍历字符串时,索引指向符文的第一个字节。符文的长度可以是1到4个字节。切片操作也将字符串视为字节数组。

之前较慢的算法的代码在这里playground。实际上,从字节转换为符文,然后再转换回字节需要很长时间。因此,最好使用上面答案中的快速算法。

请注意,这两个算法对无效的UTF-8字符处理方式不同。第一个算法将其原样处理,而第二个算法将其替换为utf8.RuneError符号('\uFFFD'),在UTF-8中的十六进制表示为efbfbd

英文:

Here is another variant playground.
It is by far more efficient in terms of both speed and memory than other answers. If you want to run benchmarks here they are benchmarks. In general it is 5 times faster than the previous version that was a fastest answer anyway.

func Chunks(s string, chunkSize int) []string {
	if len(s) == 0 {
		return nil
	}
	if chunkSize &gt;= len(s) {
		return []string{s}
	}
	var chunks []string = make([]string, 0, (len(s)-1)/chunkSize+1)
	currentLen := 0
	currentStart := 0
	for i := range s {
		if currentLen == chunkSize {
			chunks = append(chunks, s[currentStart:i])
			currentLen = 0
			currentStart = i
		}
		currentLen++
	}
	chunks = append(chunks, s[currentStart:])
	return chunks
}

Please note that the index points to a first byte of a rune on iterating over a string. The rune takes from 1 to 4 bytes. Slicing also treats the string as a byte array.

PREVIOUS SLOWER ALGORITHM

The code is here playground. The conversion from bytes to runes and then to bytes again takes a lot of time actually. So better use the fast algorithm at the top of the answer.

func ChunksSlower(s string, chunkSize int) []string {
	if chunkSize &gt;= len(s) {
		return []string{s}
	}
	var chunks []string
	chunk := make([]rune, chunkSize)
	len := 0
	for _, r := range s {
		chunk[len] = r
		len++
		if len == chunkSize {
			chunks = append(chunks, string(chunk))
			len = 0
		}
	}
	if len &gt; 0 {
		chunks = append(chunks, string(chunk[:len]))
	}
	return chunks
}

Please note that these two algorithms treat invalid UTF-8 characters in a different way. First one processes them as is when second one replaces them by utf8.RuneError symbol (&#39;\uFFFD&#39;) that has following hexadecimal representation in UTF-8: efbfbd.

答案3

得分: 6

最近也需要一个这样的函数,可以参考这里的示例用法

func SplitSubN(s string, n int) []string {
    sub := ""
    subs := []string{}

    runes := bytes.Runes([]byte(s))
    l := len(runes)
    for i, r := range runes {
        sub = sub + string(r)
        if (i + 1) % n == 0 {
            subs = append(subs, sub)
            sub = ""
        } else if (i + 1) == l {
            subs = append(subs, sub)
        }
    }

    return subs
}
英文:

Also needed a function to do this recently, see example usage here

func SplitSubN(s string, n int) []string {
	sub := &quot;&quot;
	subs := []string{}

	runes := bytes.Runes([]byte(s))
	l := len(runes)
	for i, r := range runes {
		sub = sub + string(r)
		if (i + 1) % n == 0 {
			subs = append(subs, sub)
			sub = &quot;&quot;
		} else if (i + 1) == l {
			subs = append(subs, sub)
		}
	}

	return subs
}

答案4

得分: 4

以下是另一个示例(你可以在这里尝试):

package main

import (
	"fmt"
	"strings"
)

func ChunkString(s string, chunkSize int) []string {
	var chunks []string
	runes := []rune(s)

	if len(runes) == 0 {
		return []string{s}
	}

	for i := 0; i < len(runes); i += chunkSize {
		nn := i + chunkSize
		if nn > len(runes) {
			nn = len(runes)
		}
		chunks = append(chunks, string(runes[i:nn]))
	}
	return chunks
}

func main() {
	fmt.Println(ChunkString("helloworld", 3))
	fmt.Println(strings.Join(ChunkString("helloworld", 3), "\n"))
}

这是一个用于将字符串分块的函数。它将字符串按指定的块大小进行分割,并返回一个字符串切片。在main函数中,我们使用了ChunkString函数来将字符串"helloworld"按照每个块大小为3进行分割,并打印出结果。同时,我们使用strings.Join函数将分割后的字符串切片以换行符连接起来并打印出来。

英文:

Here is another example (you can try it here):

package main

import (
	&quot;fmt&quot;
	&quot;strings&quot;
)

func ChunkString(s string, chunkSize int) []string {
	var chunks []string
	runes := []rune(s)

	if len(runes) == 0 {
		return []string{s}
	}

	for i := 0; i &lt; len(runes); i += chunkSize {
		nn := i + chunkSize
		if nn &gt; len(runes) {
			nn = len(runes)
		}
		chunks = append(chunks, string(runes[i:nn]))
	}
	return chunks
}

func main() {
	fmt.Println(ChunkString(&quot;helloworld&quot;, 3))
	fmt.Println(strings.Join(ChunkString(&quot;helloworld&quot;, 3), &quot;\n&quot;))
}

答案5

得分: 1

一个使用正则表达式的简单解决方案

re := regexp.MustCompile((\S{3}))
x := re.FindAllStringSubmatch("HelloWorld", -1)
fmt.Println(x)

https://play.golang.org/p/mfmaQlSRkHe

英文:

An easy solution using regex

re := regexp.MustCompile((\S{3}))
x := re.FindAllStringSubmatch("HelloWorld", -1)
fmt.Println(x)

https://play.golang.org/p/mfmaQlSRkHe

答案6

得分: 1

我尝试了3个版本来实现这个函数,名为"splitByWidthMake"的函数是最快的。

这些函数忽略了Unicode,只考虑ASCII码。

package main

import (
    "fmt"
    "strings"
    "time"
    "math"
)

func splitByWidthMake(str string, size int) []string {
    strLength := len(str)
    splitedLength := int(math.Ceil(float64(strLength) / float64(size)))
    splited := make([]string, splitedLength)
    var start, stop int
    for i := 0; i < splitedLength; i += 1 {
        start = i * size
        stop = start + size
        if stop > strLength {
            stop = strLength
        }
        splited[i] = str[start : stop]
    }
    return splited
}



func splitByWidth(str string, size int) []string {
    strLength := len(str)
    var splited []string
    var stop int
    for i := 0; i < strLength; i += size {
        stop = i + size
        if stop > strLength {
            stop = strLength
        }
        splited = append(splited, str[i:stop])
    }
    return splited
}

func splitRecursive(str string, size int) []string {
    if len(str) <= size {
        return []string{str}
    }
    return append([]string{string(str[0:size])}, splitRecursive(str[size:], size)...)
}

func main() {
    /*
    testStrings := []string{
        "hello world",
        "",
        "1",
    }
    */

    testStrings := make([]string, 10)
    for i := range testStrings {
        testStrings[i] = strings.Repeat("#", int(math.Pow(2, float64(i))))
    }

    //fmt.Println(testStrings)

    t1 := time.Now()
    for i := range testStrings {
        _ = splitByWidthMake(testStrings[i], 2)
        //fmt.Println(t)
    }
    elapsed := time.Since(t1)
    fmt.Println("for loop version elapsed: ", elapsed)


    t1 = time.Now()
    for i := range testStrings {
        _ = splitByWidth(testStrings[i], 2)
    }
    elapsed = time.Since(t1)
    fmt.Println("for loop without make version elapsed: ", elapsed)




    t1 = time.Now()
    for i := range testStrings {
        _ = splitRecursive(testStrings[i], 2)
    }
    elapsed = time.Since(t1)
    fmt.Println("recursive version elapsed: ", elapsed)

}
英文:

I tried 3 version to implement the function, the function named "splitByWidthMake" is fastest.

These functions ignore the unicode but only the ascii code.

package main

import (
    &quot;fmt&quot;
    &quot;strings&quot;
    &quot;time&quot;
    &quot;math&quot;
)

func splitByWidthMake(str string, size int) []string {
    strLength := len(str)
    splitedLength := int(math.Ceil(float64(strLength) / float64(size)))
    splited := make([]string, splitedLength)
    var start, stop int
    for i := 0; i &lt; splitedLength; i += 1 {
        start = i * size
        stop = start + size
        if stop &gt; strLength {
            stop = strLength
        }
        splited[i] = str[start : stop]
    }
    return splited
}



func splitByWidth(str string, size int) []string {
    strLength := len(str)
    var splited []string
    var stop int
    for i := 0; i &lt; strLength; i += size {
        stop = i + size
        if stop &gt; strLength {
            stop = strLength
        }
        splited = append(splited, str[i:stop])
    }
    return splited
}

func splitRecursive(str string, size int) []string {
    if len(str) &lt;= size {
        return []string{str}
    }
    return append([]string{string(str[0:size])}, splitRecursive(str[size:], size)...)
}

func main() {
    /*
    testStrings := []string{
        &quot;hello world&quot;,
        &quot;&quot;,
        &quot;1&quot;,
    }
    */

    testStrings := make([]string, 10)
    for i := range testStrings {
        testStrings[i] = strings.Repeat(&quot;#&quot;, int(math.Pow(2, float64(i))))
    }

    //fmt.Println(testStrings)

    t1 := time.Now()
    for i := range testStrings {
        _ = splitByWidthMake(testStrings[i], 2)
        //fmt.Println(t)
    }
    elapsed := time.Since(t1)
    fmt.Println(&quot;for loop version elapsed: &quot;, elapsed)


    t1 = time.Now()
    for i := range testStrings {
        _ = splitByWidth(testStrings[i], 2)
    }
    elapsed = time.Since(t1)
    fmt.Println(&quot;for loop without make version elapsed: &quot;, elapsed)




    t1 = time.Now()
    for i := range testStrings {
        _ = splitRecursive(testStrings[i], 2)
    }
    elapsed = time.Since(t1)
    fmt.Println(&quot;recursive version elapsed: &quot;, elapsed)

}

huangapple
  • 本文由 发表于 2014年9月5日 20:37:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/25686109.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定