英文:
Regexp replacement character
问题
我在Go中创建了一个CSV文件,并且我必须在每一列中添加引号("),我已经添加了这些引号,但是这次,CSV编程在comment列中添加了额外的双引号(如果列中有逗号(,))。
我的CSV文件如下:
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
我需要的CSV文件应该是这样的(在comment列中没有双引号):
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"
我的Golang代码如下:
RegContent := regexp.MustCompile(`",""[A-Za-z0-9]`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `","`)
fmt.Println("PLAY: ", newRegexp)
err = ioutil.WriteFile(path, []byte(newRegexp), 0)
if err != nil {
fmt.Println("error: ", err)
}
输出结果如下:
"son likes this video, good job" //(缺少My)
"don't like this video, it may be better" //(缺少I)
英文:
I created a CSV file in Go and I have to add quotation marks(") in every column, I added these but this time, CSV programming adds extra(double) quotation marks in comment column (if there is the comma(,) in column)
My CSV
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
I need CSV like this (there is not double quotation in comment columns"
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"
My Golang Code
RegContent := regexp.MustCompile(`",""[A-Za-z0-9]`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `","`)
fmt.Println("PLAY: ", newRegexp)
err = ioutil.WriteFile(path, []byte(newRegexp), 0)
if err != nil {
fmt.Println("error: ", err)
}
Output
"son likes this video, good job" //(Missing My)
"don't like this video, it may be better" //(Missing I)
答案1
得分: 2
你可以在捕获外部引号之间匹配最后一列,并在替换参数中使用反向引用来恢复该部分:
package main
import (
"fmt"
"regexp"
)
func main() {
CSV_Contents := `
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,","","","2020-12-29 23:06:32","I don't like this video, it may be better"
`
RegContent := regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"$`)
result := RegContent.ReplaceAllString(CSV_Contents, `,$1`)
fmt.Println(result)
}
查看Go演示,输出:
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,","","","2020-12-29 23:06:32","I don't like this video, it may be better"
查看正则表达式演示。详细信息:
(?m)- 多行模式开启,$将匹配行尾,"- 逗号和引号("["^"]*(?:""[^"]*)*")- 第1组($1):一个引号,然后是任意零个或多个非引号字符,然后是零个或多个""序列(如果在评论列中有转义引号,它们将保持不变),然后是零个或多个非引号字符,然后"$- 一行的末尾是引号。
英文:
You can match the last column while capturing all between outer quotes and use a backreference in the replacement argument to ReplaceAllString to restore that part:
package main
import (
"fmt"
"regexp"
)
func main() {
CSV_Contents := `
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
`
RegContent := regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"$`)
result := RegContent.ReplaceAllString(CSV_Contents, `,$1`)
fmt.Println(result)
}
See the Go demo, output:
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"
See the regex demo. Details:
(?m)- multiline mode on,$will match end of lines,"- a comma and"("[^"]*(?:""[^"]*)*")- Group 1 ($1): a", then any zero or more chars other than a", then zero or more sequences of""(if there are escaped quotes inside comment column, they will be left intact) and then zero or more non-"chars, and then"$- a"at the end of a line.
答案2
得分: 1
你可以使用ReplaceAllStringFunc()来实现所描述的行为。
f := func(s string) string {
return strings.ReplaceAll(s, `""`, `"`)
}
RegContent := regexp.MustCompile(`",""[^,].+""`)
newRegexp := RegContent.ReplaceAllStringFunc(CSV_Contents, f)
fmt.Println("PLAY: ", newRegexp)
链接:https://play.golang.org/p/1NqTyN1hs1J
另一种方法是使用ReplaceAllString():
RegContent := regexp.MustCompile(`,""([^,].+)""`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `,"$1"`)
fmt.Println("PLAY: ", newRegexp)
链接:https://play.golang.org/p/tY8zGWTbLLB
英文:
You can get described behavior with ReplaceAllStringFunc()
f := func(s string) string {
return strings.ReplaceAll(s, `""`, `"`)
}
RegContent := regexp.MustCompile(`",""[^,].+""`)
newRegexp := RegContent.ReplaceAllStringFunc(CSV_Contents, f)
fmt.Println("PLAY: ", newRegexp)
https://play.golang.org/p/1NqTyN1hs1J
And alternative with ReplaceAllString():
RegContent := regexp.MustCompile(`,""([^,].+)""`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `,"$1"`)
fmt.Println("PLAY: ", newRegexp)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论