将文本按空格分割,并在文本长度超过500时获取一个字符串数组。

huangapple go评论82阅读模式
英文:

Split the text and get an array of strings by space and get an array of strings if the text is longer than 500

问题

我需要将超过500个字符的文本拆分成多个数组。完整的任务描述如下:

将消息分成500个字符的块。如果消息超过500个字符,则从第500个字符开始查找空格,如果找到空格,则在此处将消息分成多个部分。如果找不到空格,则按照500个字符进行分割。

我的解决方案只是一个开始。我们使用正则表达式拆分文本,然后通过循环将数据添加到字符串中,如果长度允许的话。

但是我觉得有些困惑,如何获得一个字符串数组,使得每个字符串的长度都不超过500个字符?

re := regexp.MustCompile(`\s+`)
res := re.Split(str, -1)

size := 500

finalString := ""
for i, _ := range res {
	if len(finalString+" "+res[i]) <= size {
		if len(finalString) == 0 {
			finalString += res[i]
		}
		finalString += " " + res[i]
	} else {
		break // 可以添加到新行,如果长度超过500个字符,我不知道该怎么办
	}
}
英文:

I need to split a text that is more than 500 characters long into several arrays. The task in full sounds like this:

The message is divided into blocks of 500 characters. If the message is more than 500 characters, then starting from 500 characters look for a space, if a space is found, then divide the message into parts in this place. If no space is found, then we divide the message by 500 characters.

My decision, this is just the beginning.
We split the text with a regular expression, then go through a loop and add the data to the string if the length allows.
But I think I'm confused, how do I get an array of strings, so that the length of each string is appropriate up to 500 characters?

re := regexp.MustCompile(`\s+`)
res := re.Split(str, -1)

size := 500

finalString := &quot;&quot;
for i, _ := range res {
	if len(finalString+&quot; &quot;+res[i]) &lt;= size {
		if len(finalString) == 0 {
			finalString += res[i]
		}
		finalString += &quot; &quot; + res[i]
	} else {
		break // can be added to a new line, and if the length is longer, I do not know what to do
	}
}

答案1

得分: 1

只是为了确保我正确理解,您想在每500个字符后的第一个空格处拆分文本吗?

请注意,字符串连接可能相对昂贵(像finalString += " " + res[I]这样将两个字符添加到字符串末尾会产生O(N)的复杂度,因为字符串的底层字节数组必须被复制。

更好的解决方案是依赖于bytes.Buffer。我编写了这个示例代码(注意您可以更改splitLength - 我不想在这里粘贴一个500个字符的输入)。

package main

import (
	"bytes"
	"fmt"
)

func main() {
	prettyPrint(split([]byte("1234567 123 12345 1234567"), 5))
}

func prettyPrint(b [][]byte) {
	for _, a := range b {
		fmt.Println(string(a))
	}
}

func split(b []byte, splitLength int) [][]byte {
	current := new(bytes.Buffer)
	var bSlice [][]byte
	counter := 0
	shouldTerminate := false

	for i, c := range b {
		if shouldTerminate == true && c == byte(32) {
			counter = 0
			shouldTerminate = false

			bSlice = append(bSlice, current.Bytes())
			current = new(bytes.Buffer)

			continue
		}

		counter++
		current.Write([]byte{c})

		if counter > splitLength {
			shouldTerminate = true
		}

		if i == len(b)-1 {
			bSlice = append(bSlice, current.Bytes())
		}
	}

	return bSlice
}
英文:

Just to make sure I understand this correctly, you're looking to split text at the first space after every 500 characters?

Keep in mind that string concatenation can be relatively expensive (adding two chars to the end of your string like finalString += &quot; &quot; + res[I] incurs O(N) complexity, since the string's underlying byte array must be copied.

A better solution is to rely on bytes.Buffer. I wrote this sample code up (note you can change splitLength — I didn't want to paste a 500 character input in here).

package main

import (
	&quot;bytes&quot;
	&quot;fmt&quot;
)

func main() {
	prettyPrint(split([]byte(&quot;1234567 123 12345 1234567&quot;), 5))
}

func prettyPrint(b [][]byte) {
	for _, a := range b {
		fmt.Println(string(a))
	}
}

func split(b []byte, splitLength int) [][]byte {
	current := new(bytes.Buffer)
	var bSlice [][]byte
	counter := 0
	shouldTerminate := false

	for i, c := range b {
		if shouldTerminate == true &amp;&amp; c == byte(32) {
			counter = 0
			shouldTerminate = false

			bSlice = append(bSlice, current.Bytes())
			current = new(bytes.Buffer)

			continue
		}

		counter++
		current.Write([]byte{c})

		if counter &gt; splitLength {
			shouldTerminate = true
		}

		if i == len(b)-1 {
			bSlice = append(bSlice, current.Bytes())
		}
	}

	return bSlice
}

答案2

得分: 0

同意@lickety-split的评论,你需要在Go中实现一个strtok函数。
你可以使用这个实现 - https://github.com/dannav/tokenize 或者自己实现。希望这能帮到你!

英文:

Agree with the comments by @lickety-split, you need a strtok function implementation in go.
You can use this implementation - https://github.com/dannav/tokenize or implement your own. Hope this helps!

huangapple
  • 本文由 发表于 2022年12月19日 19:57:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/74850202.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定