英文:
Go equivalent of C's negated scansets
问题
在Go语言中,可以使用fmt.Sscanf
函数来模拟C语言中的否定扫描集(negated scansets)。对于你提供的示例输入字符串aaaa, bbbb
,使用以下代码可以实现类似的功能:
package main
import (
"fmt"
)
func main() {
var str1, str2 string
input := "aaaa, bbbb"
_, err := fmt.Sscanf(input, "%[^,], %s", &str1, &str2)
if err != nil {
fmt.Println(err)
return
}
fmt.Println("str1:", str1)
fmt.Println("str2:", str2)
}
运行以上代码,输出结果为:
str1: aaaa
str2: bbbb
通过使用%[^,]
来匹配逗号之前的字符串,可以避免只设置str1
的问题。
英文:
What is the way to mimick the negated scansets that exist in C?
For an example input string: aaaa, bbbb
In go using:
fmt.Sscanf(input, "%s, %s", &str1, &str2)
The result is only str1
being set as: aaaa,
In C one could use a format string as "%[^,], %s"
to avoid this problem, is there a way to accomplish this in go?
答案1
得分: 2
Go语言不像C语言那样直接支持这种操作,部分原因是因为你应该读取一行并使用类似strings.FieldsFunc
的方法。但这只是一个非常简单的观点。对于以同质方式格式化的数据,你可以使用bufio.Scanner
与任何io.Reader
一起实现相同的功能。然而,如果你需要处理以下格式的数据:
// Name; email@domain
//
// 除了分号`;`之外的任何字符都是有效的名称。
// 除了`@`之前的任何字符都是有效的电子邮件。
// 对于域名,只有A-Z、a-z、0-9以及`-`和`.`是有效的。
sscanf("%[^;]; %[^@]@%[-." ALNUM "]", name, email, domain);
那么你可能会遇到麻烦,因为你现在正在处理特定的状态。在这种情况下,你可能更喜欢使用bufio.Reader
手动解析数据。还可以选择实现fmt.Scanner
接口。下面是一些示例代码,让你了解如何轻松实现fmt.Scanner
:
// Scanset 用于扫描字符串时作为过滤器。
// Scanset 的零值将丢弃所有非空白字符。
type Scanset struct {
ps *string
delimFunc func(rune) bool
}
// 创建一个新的 Scanset 以过滤分隔符字符。
// 一旦 f(delimChar) 返回 false,扫描将结束。
// 如果 s 为 nil,则丢弃 f(delimChar) 返回 true 的字符。
// 如果 f 为 nil,则使用 !unicode.IsSpace(delimChar)
// (即读取直到 unicode.IsSpace(delimChar) 返回 true)。
func NewScanset(s *string, f func(r rune) bool) *Scanset {
return &Scanset{
ps: s,
delimFunc: f,
}
}
// Scan 实现了 Scanset 类型的 fmt.Scanner 接口。
func (s *Scanset) Scan(state fmt.ScanState, verb rune) error {
if verb != 'v' && verb != 's' {
return errors.New("scansets only work with %v and %s verbs")
}
tok, err := state.Token(false, s.delimFunc)
if err != nil {
return err
}
if s.ps != nil {
*s.ps = string(tok)
}
return nil
}
虽然它不是C语言的扫描集,但它非常接近。正如前面提到的,无论是格式化输入还是其他形式的输入,你都应该验证数据,因为格式化缺乏上下文(在处理格式化时添加上下文违反了KISS原则,会降低代码的可读性)。
例如,像[A-Za-z]([A-Za-z0-9-]?.)[A-Za-z0-9]
这样的简短正则表达式并不足以验证域名,而一个简单的扫描集只是等同于[A-Za-z0-9.-]
。然而,扫描集足以从文件或其他读取器中扫描字符串,但它不足以仅凭字符串本身进行验证。对于这个目的,正则表达式或甚至一个合适的库将是更好的选择。
英文:
Go doesn't support this directly like C, partially because you should be reading a line and using something like strings.FieldsFunc
. But that's naturally a very simplistic view. For data formatted in a homogeneous manner, you could use bufio.Scanner
to essentially do the same thing with any io.Reader
. However, if you had to deal with something like this format:
// Name; email@domain
//
// Anything other than ';' is valid for name.
// Anything before '@' is valid for email.
// For domain, only A-Z, a-z, and 0-9, as well as '-' and '.' are valid.
sscanf("%[^;]; %[^@]@%[-." ALNUM "]", name, email, domain);
then you'd run into trouble because you're now dealing with a particular state. In such a case, you might prefer working with bufio.Reader
to manually parse things. There's also the option of implementing fmt.Scanner
. Here's some sample code to give you an idea of how easy it can be to implement fmt.Scanner
:
// Scanset acts as a filter when scanning strings.
// The zero value of a Scanset will discard all non-whitespace characters.
type Scanset struct {
ps *string
delimFunc func(rune) bool
}
// Create a new Scanset to filter delimiter characters.
// Once f(delimChar) returns false, scanning will end.
// If s is nil, characters for which f(delimChar) returns true are discarded.
// If f is nil, !unicode.IsSpace(delimChar) is used
// (i.e. read until unicode.IsSpace(delimChar) returns true).
func NewScanset(s *string, f func(r rune) bool) *Scanset {
return &Scanset{
ps: s,
delimFunc: f,
}
}
// Scan implements the fmt.Scanner interface for the Scanset type.
func (s *Scanset) Scan(state fmt.ScanState, verb rune) error {
if verb != 'v' && verb != 's' {
return errors.New("scansets only work with %v and %s verbs")
}
tok, err := state.Token(false, s.delimFunc)
if err != nil {
return err
}
if s.ps != nil {
*s.ps = string(tok)
}
return nil
}
It's not C's scansets, but it's close enough. As mentioned, you should be validating your data anyway, even with formatted input, because formatting lacks context (and adding it while dealing with formatting violates the KISS principle and worsens the readability of your code).
For example, a short regex like [A-Za-z]([A-Za-z0-9-]?.)[A-Za-z0-9]
isn't enough to validate a domain name, and a simplistic scanset would simply be the equivalent of [A-Za-z0-9.-]
. The scanset, however, would be enough to scan the string from a file or whatever other reader you might be using, but it wouldn't be enough to validate the string alone. For that, a regex or even a proper library would be a much better option.
答案2
得分: 1
你可以尝试使用正则表达式;
re := regexp.MustCompile((\w+), (\w+)
)
input := "aaaa, bbbb"
fmt.Printf("%#v\n", re.FindStringSubmatch(input))
// 输出 []string{"aaaa, bbbb", "aaaa", "bbbb"}
英文:
You could always go for regular expressions;
re := regexp.MustCompile(`(\w+), (\w+)`)
input := "aaaa, bbbb"
fmt.Printf("%#v\n", re.FindStringSubmatch(input))
// Prints []string{"aaaa, bbbb", "aaaa", "bbbb"}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论