如何在Go中正确处理带有转义字符的字符串?

huangapple go评论81阅读模式
英文:

How to correctly process a string with escapes in Go?

问题

我正在创建一个程序,用于处理和计算开源存储库和库的大小,并将数据保存到数据库以供进一步分析。

  • 我有一个输入字符串:github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1
  • 解析为格式:github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1
  • 然后我将其解析为格式/home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm@v0.0.0-20210617225240-d185dfc1b5a1,这是我文件系统中的有效路径,我已经下载了该特定的Go库。
  • 然后,我将该路径传递给gocloc程序(https://github.com/hhatto/gocloc)
  • 并解析结果。

但问题是,当我将字符串/home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm@v0.0.0-20210617225240-d185dfc1b5a1保存到变量中时,Go实际上会向我保存的字符串添加另一个转义符,所以在内存中实际上是/home/username/dev/glass/tmp/pkg/mod/github.com/\\!azure/go-ansiterm@v0.0.0-20210617225240-d185dfc1b5a1(例如,fmt.Println会删除它)

问题是,当我将该字符串作为参数传递给运行gocloc和该路径字符串的os/exec时,它会使用两个转义符运行命令,这不是一个有效的路径。

有没有办法解决这个问题?我想到的一个办法是创建一个关于我想要做什么的shell脚本。

这是一个将github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1解析为格式github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1的函数,并且在保存到变量后,该变量比应该有的多一个转义符。

func parseUrlToVendorDownloadFormat(input string) string {
	// 在第一个空格字符上拆分输入字符串
	parts := strings.SplitN(input, " ", 2)
	if len(parts) != 2 {
		return ""
	}

	// 在“/”字符上拆分包名
	packageNameParts := strings.Split(parts[0], "/")

	// 添加“!”前缀并将包名的每个部分转换为小写
	for i, part := range packageNameParts {
		if hasUppercase(part) {
			packageNameParts[i] = "\\!" + strings.ToLower(part)
		}
	}

	// 使用“/”字符连接修改后的包名部分
	packageName := strings.Join(packageNameParts, "/")

	return strings.ReplaceAll(packageName+"@"+parts[1], `\\!`, `\!`)
}

然后,字符串被解析为格式:/home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm@v0.0.0-20210617225240-d185dfc1b5a1

传递给此函数的是:

// Alternative goCloc - command.
func linesOfCode(dir string) (int, error) {
	// 在指定的目录中运行“gocloc”命令并获取输出
	cmd := exec.Command("gocloc", dir)
	output, err := cmd.Output()
	if err != nil {
		return 0, err
	}

	lines, err := parseTotalLines(string(output))
	if err != nil {
		return 0, err
	}

	return lines, nil
}

它使用以下解析函数:

// 从GoCloc响应中解析。
func parseTotalLines(input string) (int, error) {
	// 将输入字符串拆分为行
	lines := strings.Split(input, "\n")

	// 查找包含“TOTAL”行的行
	var totalLine string
	for _, line := range lines {
		if strings.Contains(line, "TOTAL") {
			totalLine = line
			break
		}
	}

	// 如果未找到“TOTAL”行,则返回错误
	if totalLine == "" {
		return 0, fmt.Errorf("无法在输入中找到TOTAL行")
	}

	// 将“TOTAL”行拆分为字段
	fields := strings.Fields(totalLine)

	// 如果“TOTAL”行的字段不足,则返回错误
	if len(fields) < 4 {
		return 0, fmt.Errorf("无效的TOTAL行:字段不足")
	}

	// 获取第四个字段(代码列)
	codeStr := fields[3]

	// 从代码列中删除任何逗号
	codeStr = strings.Replace(codeStr, ",", "", -1)

	// 将代码列解析为整数
	code, err := strconv.Atoi(codeStr)
	if err != nil {
		return 0, err
	}

	return code, nil
}

我尝试过的方法:

  • 将gocloc作为库使用,但无法使其正常工作。
  • 使用单引号而不是转义符,但无法使其正常工作,但我认为可能有其他问题。

解决这个问题的一种方法可能是创建一个单独的shell脚本,并将dir作为参数传递给它,在那里去掉转义符,我不知道...

如果你想观察所有的源代码:https://github.com/haapjari/glass,更具体地说,是文件https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/plugin.go和函数enrichWithLibraryData()以及这里的实用函数:https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/utils.go(上面的示例)

有什么想法?如何继续?提前谢谢!

英文:

I am creating a program, which is processing and calculating sizes of open-source repositories and libraries, and saving the data to database for further analysis.

  • I have an input string: github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
  • Parsed to a format: github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1
  • Then I parse that into a format /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm@v0.0.0-20210617225240-d185dfc1b5a1 which is a valid path in my filesystem, where I've downloaded that particular Go Library.
  • After that, I am passing that path to the gocloc -program (https://github.com/hhatto/gocloc)
  • And parse the result.

But the issue is, when I am saving that string /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm@v0.0.0-20210617225240-d185dfc1b5a1 into a variable, Go actually adds another escape to the string I am saving so it's actually /home/username/dev/glass/tmp/pkg/mod/github.com/\\!azure/go-ansiterm@v0.0.0-20210617225240-d185dfc1b5a1 in memory. (fmt.Println - for example removes that)

如何在Go中正确处理带有转义字符的字符串?

Problem is, when I am passing that string as an argument to os/exec, which runs gocloc and that path string, it runs command with two escapes - and that's not a valid path.

Is there any way to work around this? One idea for me is to just a create shell script on what I want to do

This is the function, which parses github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 to a format github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 - and after thats saved into a variable, and the variable has one more escapes, than it should have.

func parseUrlToVendorDownloadFormat(input string) string {
	// Split the input string on the first space character
	parts := strings.SplitN(input, &quot; &quot;, 2)
	if len(parts) != 2 {
		return &quot;&quot;
	}

	// Split the package name on the &#39;/&#39; character
	packageNameParts := strings.Split(parts[0], &quot;/&quot;)

	// Add the &#39;\!&#39; prefix and lowercase each part of the package name
	for i, part := range packageNameParts {
		if hasUppercase(part) {
			packageNameParts[i] = &quot;\\!&quot; + strings.ToLower(part)
		}
	}

	// Join the modified package name parts with &#39;/&#39; characters
	packageName := strings.Join(packageNameParts, &quot;/&quot;)

	return strings.ReplaceAll(packageName+&quot;@&quot;+parts[1], `\\!`, `\!`)
}

After, string is parsed to a format: /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm@v0.0.0-20210617225240-d185dfc1b5a1

that is passed to this function:

// Alternative goCloc - command.
func linesOfCode(dir string) (int, error) {
	// Run the `gocloc` command in the specified directory and get the output
	cmd := exec.Command(&quot;gocloc&quot;, dir)
	output, err := cmd.Output()
	if err != nil {
		return 0, err
	}

	lines, err := parseTotalLines(string(output))
	if err != nil {
		return 0, err
	}

	return lines, nil
}

Which uses this parse function:

// Parse from the GoCloc response.
func parseTotalLines(input string) (int, error) {
	// Split the input string into lines
	lines := strings.Split(input, &quot;\n&quot;)

	// Find the line containing the &quot;TOTAL&quot; row
	var totalLine string
	for _, line := range lines {
		if strings.Contains(line, &quot;TOTAL&quot;) {
			totalLine = line
			break
		}
	}

	// If the &quot;TOTAL&quot; line was not found, return an error
	if totalLine == &quot;&quot; {
		return 0, fmt.Errorf(&quot;could not find TOTAL line in input&quot;)
	}

	// Split the &quot;TOTAL&quot; line into fields
	fields := strings.Fields(totalLine)

	// If the &quot;TOTAL&quot; line doesn&#39;t have enough fields, return an error
	if len(fields) &lt; 4 {
		return 0, fmt.Errorf(&quot;invalid TOTAL line: not enough fields&quot;)
	}

	// Get the fourth field (the code column)
	codeStr := fields[3]

	// Remove any commas from the code column
	codeStr = strings.Replace(codeStr, &quot;,&quot;, &quot;&quot;, -1)

	// Parse the code column as an integer
	code, err := strconv.Atoi(codeStr)
	if err != nil {
		return 0, err
	}

	return code, nil
}

What I've tried:

  • Use gocloc as a library, didn't get it to work.
  • Use single quotes instead of escapes, didn't get it to work, but I think there might be something.

One way to get around this, might be to create separate shell script and pass the dir to that as an argument, and get rid of the escapes there, I don't know ...

If you want to observe all the source code: https://github.com/haapjari/glass and more specificly, it's the files https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/plugin.go and function enrichWithLibraryData() and utils functions, which are here: https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/utils.go (the examples above)

Any ideas? How to proceed? Thanks in advance!

答案1

得分: 1

我有一个输入字符串:github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1

解析后的格式为:github.com/!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1


你的解析器似乎有错误。我期望Azure变成!azure

github.com/!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1


Go Modules 参考文档

为了避免在不区分大小写的文件系统中出现歧义,$module 和 $version 元素会进行大小写编码,将每个大写字母替换为感叹号后跟相应的小写字母。这样,模块 example.com/Mexample.com/m 都可以存储在磁盘上,因为前者被编码为 example.com/!m

英文:

> I have an input string: github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
>
> Parsed to a format: github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.


Your parser seems to have error. I would expect Azure to become !azure:

github.com/!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.


> Go Modules Reference
>
> To avoid ambiguity when serving from case-insensitive file systems, the $module and $version elements are case-encoded by replacing every uppercase letter with an exclamation mark followed by the corresponding lower-case letter. This allows modules example.com/M and example.com/m to both be stored on disk, since the former is encoded as example.com/!m.


huangapple
  • 本文由 发表于 2023年1月7日 04:56:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75036016.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定