Go中与C的否定扫描集(negated scansets)相对应的是什么?

huangapple go评论75阅读模式
英文:

Go equivalent of C's negated scansets

问题

在Go语言中,可以使用fmt.Sscanf函数来模拟C语言中的否定扫描集(negated scansets)。对于你提供的示例输入字符串aaaa, bbbb,使用以下代码可以实现类似的功能:

package main

import (
	"fmt"
)

func main() {
	var str1, str2 string
	input := "aaaa, bbbb"
	_, err := fmt.Sscanf(input, "%[^,], %s", &str1, &str2)
	if err != nil {
		fmt.Println(err)
		return
	}
	fmt.Println("str1:", str1)
	fmt.Println("str2:", str2)
}

运行以上代码,输出结果为:

str1: aaaa
str2: bbbb

通过使用%[^,]来匹配逗号之前的字符串,可以避免只设置str1的问题。

英文:

What is the way to mimick the negated scansets that exist in C?

For an example input string: aaaa, bbbb

In go using:

fmt.Sscanf(input, "%s, %s", &str1, &str2)

The result is only str1 being set as: aaaa,

In C one could use a format string as "%[^,], %s" to avoid this problem, is there a way to accomplish this in go?

答案1

得分: 2

Go语言不像C语言那样直接支持这种操作,部分原因是因为你应该读取一行并使用类似strings.FieldsFunc的方法。但这只是一个非常简单的观点。对于以同质方式格式化的数据,你可以使用bufio.Scanner与任何io.Reader一起实现相同的功能。然而,如果你需要处理以下格式的数据:

// Name; email@domain
//
// 除了分号`;`之外的任何字符都是有效的名称。
// 除了`@`之前的任何字符都是有效的电子邮件。
// 对于域名,只有A-Z、a-z、0-9以及`-`和`.`是有效的。
sscanf("%[^;]; %[^@]@%[-." ALNUM "]", name, email, domain);

那么你可能会遇到麻烦,因为你现在正在处理特定的状态。在这种情况下,你可能更喜欢使用bufio.Reader手动解析数据。还可以选择实现fmt.Scanner接口。下面是一些示例代码,让你了解如何轻松实现fmt.Scanner

// Scanset 用于扫描字符串时作为过滤器。
// Scanset 的零值将丢弃所有非空白字符。
type Scanset struct {
    ps        *string
    delimFunc func(rune) bool
}

// 创建一个新的 Scanset 以过滤分隔符字符。
// 一旦 f(delimChar) 返回 false,扫描将结束。
// 如果 s 为 nil,则丢弃 f(delimChar) 返回 true 的字符。
// 如果 f 为 nil,则使用 !unicode.IsSpace(delimChar)
// (即读取直到 unicode.IsSpace(delimChar) 返回 true)。
func NewScanset(s *string, f func(r rune) bool) *Scanset {
    return &Scanset{
        ps:        s,
        delimFunc: f,
    }
}

// Scan 实现了 Scanset 类型的 fmt.Scanner 接口。
func (s *Scanset) Scan(state fmt.ScanState, verb rune) error {
    if verb != 'v' && verb != 's' {
        return errors.New("scansets only work with %v and %s verbs")
    }
    tok, err := state.Token(false, s.delimFunc)
    if err != nil {
        return err
    }
    if s.ps != nil {
        *s.ps = string(tok)
    }
    return nil
}

Playground 示例

虽然它不是C语言的扫描集,但它非常接近。正如前面提到的,无论是格式化输入还是其他形式的输入,你都应该验证数据,因为格式化缺乏上下文(在处理格式化时添加上下文违反了KISS原则,会降低代码的可读性)。

例如,像[A-Za-z]([A-Za-z0-9-]?.)[A-Za-z0-9]这样的简短正则表达式并不足以验证域名,而一个简单的扫描集只是等同于[A-Za-z0-9.-]。然而,扫描集足以从文件或其他读取器中扫描字符串,但它不足以仅凭字符串本身进行验证。对于这个目的,正则表达式或甚至一个合适的库将是更好的选择。

英文:

Go doesn't support this directly like C, partially because you should be reading a line and using something like strings.FieldsFunc. But that's naturally a very simplistic view. For data formatted in a homogeneous manner, you could use bufio.Scanner to essentially do the same thing with any io.Reader. However, if you had to deal with something like this format:

// Name; email@domain
//
// Anything other than ';' is valid for name.
// Anything before '@' is valid for email.
// For domain, only A-Z, a-z, and 0-9, as well as '-' and '.' are valid.
sscanf("%[^;]; %[^@]@%[-." ALNUM "]", name, email, domain);

then you'd run into trouble because you're now dealing with a particular state. In such a case, you might prefer working with bufio.Reader to manually parse things. There's also the option of implementing fmt.Scanner. Here's some sample code to give you an idea of how easy it can be to implement fmt.Scanner:

// Scanset acts as a filter when scanning strings.
// The zero value of a Scanset will discard all non-whitespace characters.
type Scanset struct {
	ps        *string
	delimFunc func(rune) bool
}

// Create a new Scanset to filter delimiter characters.
// Once f(delimChar) returns false, scanning will end.
// If s is nil, characters for which f(delimChar) returns true are discarded.
// If f is nil, !unicode.IsSpace(delimChar) is used
// (i.e. read until unicode.IsSpace(delimChar) returns true).
func NewScanset(s *string, f func(r rune) bool) *Scanset {
	return &Scanset{
		ps:        s,
		delimFunc: f,
	}
}

// Scan implements the fmt.Scanner interface for the Scanset type.
func (s *Scanset) Scan(state fmt.ScanState, verb rune) error {
	if verb != 'v' && verb != 's' {
		return errors.New("scansets only work with %v and %s verbs")
	}
	tok, err := state.Token(false, s.delimFunc)
	if err != nil {
		return err
	}
	if s.ps != nil {
		*s.ps = string(tok)
	}
	return nil
}

Playground example

It's not C's scansets, but it's close enough. As mentioned, you should be validating your data anyway, even with formatted input, because formatting lacks context (and adding it while dealing with formatting violates the KISS principle and worsens the readability of your code).

For example, a short regex like [A-Za-z]([A-Za-z0-9-]?.)[A-Za-z0-9] isn't enough to validate a domain name, and a simplistic scanset would simply be the equivalent of [A-Za-z0-9.-]. The scanset, however, would be enough to scan the string from a file or whatever other reader you might be using, but it wouldn't be enough to validate the string alone. For that, a regex or even a proper library would be a much better option.

答案2

得分: 1

你可以尝试使用正则表达式;

re := regexp.MustCompile((\w+), (\w+))
input := "aaaa, bbbb"
fmt.Printf("%#v\n", re.FindStringSubmatch(input))
// 输出 []string{"aaaa, bbbb", "aaaa", "bbbb"}

英文:

You could always go for regular expressions;

re := regexp.MustCompile(`(\w+), (\w+)`)
input := "aaaa, bbbb"
fmt.Printf("%#v\n", re.FindStringSubmatch(input))
// Prints []string{"aaaa, bbbb", "aaaa", "bbbb"}

huangapple
  • 本文由 发表于 2017年6月4日 02:08:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/44347093.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定