正则表达式替换字符

huangapple go评论88阅读模式
英文:

Regexp replacement character

问题

我在Go中创建了一个CSV文件,并且我必须在每一列中添加引号("),我已经添加了这些引号,但是这次,CSV编程在comment列中添加了额外的双引号(如果列中有逗号(,))。

我的CSV文件如下:

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""

我需要的CSV文件应该是这样的(在comment列中没有双引号):

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"

我的Golang代码如下:

RegContent := regexp.MustCompile(`",""[A-Za-z0-9]`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `","`)
fmt.Println("PLAY: ", newRegexp)
err = ioutil.WriteFile(path, []byte(newRegexp), 0)
if err != nil {
    fmt.Println("error: ", err)
}

输出结果如下:

"son likes this video, good job" //(缺少My)
"don't like this video, it may be better" //(缺少I)
英文:

I created a CSV file in Go and I have to add quotation marks(") in every column, I added these but this time, CSV programming adds extra(double) quotation marks in comment column (if there is the comma(,) in column)

My CSV

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""

I need CSV like this (there is not double quotation in comment columns"

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"

My Golang Code

RegContent := regexp.MustCompile(`",""[A-Za-z0-9]`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `","`)
fmt.Println("PLAY: ", newRegexp)
err = ioutil.WriteFile(path, []byte(newRegexp), 0)
if err != nil {
    fmt.Println("error: ", err)
}

Output

"son likes this video, good job" //(Missing My)
"don't like this video, it may be better" //(Missing I)

答案1

得分: 2

你可以在捕获外部引号之间匹配最后一列,并在替换参数中使用反向引用来恢复该部分:

package main

import (
	"fmt"
	"regexp"
)

func main() {
        CSV_Contents := `
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,","","","2020-12-29 23:06:32","I don't like this video, it may be better"
`	
	RegContent := regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"$`)
	result := RegContent.ReplaceAllString(CSV_Contents, `,$1`)
	fmt.Println(result)
}

查看Go演示,输出:

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,","","","2020-12-29 23:06:32","I don't like this video, it may be better"

查看正则表达式演示详细信息

  • (?m) - 多行模式开启,$将匹配行尾
  • ," - 逗号和引号
  • ("["^"]*(?:""[^"]*)*") - 第1组($1):一个引号,然后是任意零个或多个非引号字符,然后是零个或多个""序列(如果在评论列中有转义引号,它们将保持不变),然后是零个或多个非引号字符,然后
  • "$ - 一行的末尾是引号。
英文:

You can match the last column while capturing all between outer quotes and use a backreference in the replacement argument to ReplaceAllString to restore that part:

package main

import (
	"fmt"
	"regexp"
)

func main() {
        CSV_Contents := `
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
`	
	RegContent := regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"$`)
	result := RegContent.ReplaceAllString(CSV_Contents, `,$1`)
	fmt.Println(result)
}

See the Go demo, output:

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"

See the regex demo. Details:

  • (?m) - multiline mode on, $ will match end of lines
  • ," - a comma and "
  • ("[^"]*(?:""[^"]*)*") - Group 1 ($1): a ", then any zero or more chars other than a ", then zero or more sequences of "" (if there are escaped quotes inside comment column, they will be left intact) and then zero or more non-" chars, and then
  • "$ - a " at the end of a line.

答案2

得分: 1

你可以使用ReplaceAllStringFunc()来实现所描述的行为。

f := func(s string) string {
   return strings.ReplaceAll(s, `""`, `"`)
}
RegContent := regexp.MustCompile(`",""[^,].+""`)
newRegexp := RegContent.ReplaceAllStringFunc(CSV_Contents, f)
fmt.Println("PLAY: ", newRegexp)

链接:https://play.golang.org/p/1NqTyN1hs1J

另一种方法是使用ReplaceAllString()

RegContent := regexp.MustCompile(`,""([^,].+)""`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `,"$1"`)
fmt.Println("PLAY: ", newRegexp)

链接:https://play.golang.org/p/tY8zGWTbLLB

英文:

You can get described behavior with ReplaceAllStringFunc()

f := func(s string) string {
   return strings.ReplaceAll(s, `""`, `"`)
}
RegContent := regexp.MustCompile(`",""[^,].+""`)
newRegexp := RegContent.ReplaceAllStringFunc(CSV_Contents, f)
fmt.Println("PLAY: ", newRegexp)

https://play.golang.org/p/1NqTyN1hs1J

And alternative with ReplaceAllString():

RegContent := regexp.MustCompile(`,""([^,].+)""`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `,"$1"`)
fmt.Println("PLAY: ", newRegexp)

https://play.golang.org/p/tY8zGWTbLLB

huangapple
  • 本文由 发表于 2021年8月8日 21:29:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/68701250.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定