解析特定格式的输入

huangapple go评论97阅读模式
英文:

Parse input from a particular format

问题

让我们假设我有以下字符串:"Algorithms 1" by Robert Sedgewick。这是从终端输入的内容。

这个字符串的格式总是:

  1. 以双引号开头
  2. 后面是字符(可能包含空格)
  3. 接着是双引号
  4. 然后是一个空格
  5. 然后是单词"by"
  6. 再接着是一个空格
  7. 最后是字符(可能包含空格)

了解了上述格式后,我该如何读取这个字符串?

我尝试使用fmt.Scanf(),但它会将每个空格后面的单词作为单独的值处理。我查看了正则表达式,但我无法确定是否有一种函数可以获取值而不仅仅是测试有效性。

英文:

Let us say I have the following string: "Algorithms 1" by Robert Sedgewick. This is input from the terminal.

The format of this string will always be:

  1. Starts with a double quote
  2. Followed by characters (may contain space)
  3. Followed by double quote
  4. Followed by space
  5. Followed by the word "by"
  6. Followed by space
  7. Followed by characters (may contain space)

Knowing the above format, how do I read this?

I tried using fmt.Scanf() but that would treat a word after each space as a separate value. I looked at regular expressions but I could not make out if there is a function using which I could GET values and not just test for validity.

答案1

得分: 5

1)使用字符搜索

输入格式非常简单,你可以直接使用strings.IndexRune()函数来实现字符搜索:

s := `"Algorithms 1" by Robert Sedgewick`

s = s[1:]                      // 排除第一个双引号
x := strings.IndexRune(s, ''"') // 找到第二个双引号
title := s[:x]                 // 标题在两个双引号之间
author := s[x+5:]              // 作者紧随其后,排除掉" by ",剩下的就是作者

// 打印结果
fmt.Println("Title:", title)
fmt.Println("Author:", author)

输出结果:

Title: Algorithms 1
Author: Robert Sedgewick

你可以在Go Playground上尝试运行。

2)使用分割

另一种解决方案是使用strings.Split()函数:

s := `"Algorithms 1" by Robert Sedgewick`

parts := strings.Split(s, `"`)
title := parts[1]      // 第一部分为空,第二部分是标题
author := parts[2][4:] // 第三部分是作者,但要去掉" by "

// 输出结果与前面相同

你可以在Go Playground上尝试运行。

3)使用“巧妙”的分割

如果我们去掉第一个双引号,可以使用分隔符" by "进行分割:

s := `"Algorithms 1" by Robert Sedgewick`

parts := strings.Split(s[1:], `" by `)
title := parts[0]  // 第一部分正好是标题
author := parts[1] // 第二部分正好是作者

// 输出结果与前面相同

你可以在Go Playground上尝试运行。

4)使用正则表达式

如果你仍然想使用正则表达式,可以这样做:

使用括号定义你想要获取的子匹配项。你想要获取两部分:双引号之间的标题和紧随其后的by后面的作者。你可以使用regexp.FindStringSubmatch()函数来获取匹配的部分。注意返回切片中的第一个元素是完整的输入,所以相关的部分是后续的元素:

s := `"Algorithms 1" by Robert Sedgewick`

r := regexp.MustCompile(`"([^"]*)" by (.*)`)
parts := r.FindStringSubmatch(s)
title := parts[1]  // 第一部分始终是完整的输入,第二部分是标题
author := parts[2] // 第三部分正好是作者

// 输出结果与前面相同

你可以在Go Playground上尝试运行。

英文:

The input format is so simple, you can simply use character search implemented in strings.IndexRune():

s := `"Algorithms 1" by Robert Sedgewick`

s = s[1:]                      // Exclude first double qote
x := strings.IndexRune(s, '"') // Find the 2nd double quote
title := s[:x]                 // Title is between the 2 double qotes
author := s[x+5:]              // Which is followed by " by ", exclude that, rest is author

Printing results with:

fmt.Println("Title:", title)
fmt.Println("Author:", author)

Output:

Title: Algorithms 1
Author: Robert Sedgewick

Try it on the Go Playground.

2) With splitting

Another solution would be to use strings.Split():

s := `"Algorithms 1" by Robert Sedgewick`

parts := strings.Split(s, `"`)
title := parts[1]      // First part is empty, 2nd is title
author := parts[2][4:] // 3rd is author, but cut off " by "

Output is the same. Try it on the Go Playground.

3) With a "tricky" splitting

If we cut off the first double quote, we may do a splitting by the separator

`" by `

If we do so, we will have exactly the 2 parts: title and author. Since we cut off first double quote, the separator can only be at the end of the title (the title cannot contain double quotes as per your rules):

s := `"Algorithms 1" by Robert Sedgewick`

parts := strings.Split(s[1:], `" by `)
title := parts[0]  // First part is exactly the title
author := parts[1] // 2nd part is exactly the author

Try it on the Go Playground.

4) With regexp

If after all the above solutions you still want to use regexp, here's how you could do it:

Use parenthesis to define submatches you want to get out. You want 2 parts: the title between quotes and the author that follows by. You can use regexp.FindStringSubmatch() to get the matching parts. Note that the first element in the returned slice will be the complete input, so relevant parts are the subsequent elements:

s := `"Algorithms 1" by Robert Sedgewick`

r := regexp.MustCompile(`"([^"]*)" by (.*)`)
parts := r.FindStringSubmatch(s)
title := parts[1]  // First part is always the complete input, 2nd part is the title
author := parts[2] // 3rd part is exactly the author

Try it on the Go Playground.

答案2

得分: 4

你应该使用分组(括号)来提取你想要的信息:

"([\w\s]*)" by ([\w\s]+)\.

这将返回两个分组:

  1. [1-13] Algorithms 1
  2. [18-34] Robert Sedgewick

现在应该有一个正则表达式方法来从文本中获取所有匹配项。结果将包含一个匹配对象,其中包含这些分组。

我认为在Go语言中可以使用FindAllStringSubmatch方法。
(https://github.com/StefanSchroeder/Golang-Regex-Tutorial/blob/master/01-chapter2.markdown)

在这里测试一下:
https://regex101.com/r/cT2sC5/1

英文:

You should use groups (parentheses) to get out the information you want:

"([\w\s]*)"\sby\s([\w\s]+)\.

This returns two groups:

  1. [1-13] Algorithms 1
  2. [18-34] Robert Sedgewick

Now there should be a regex method to get all matches out of a text. The result will contain a match object which then contains the groups.

I think in go it is: FindAllStringSubmatch
(https://github.com/StefanSchroeder/Golang-Regex-Tutorial/blob/master/01-chapter2.markdown)

Test it out here:
https://regex101.com/r/cT2sC5/1

huangapple
  • 本文由 发表于 2015年7月16日 14:31:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/31446796.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定