英文:
Regexp replacement character
问题
我在Go中创建了一个CSV文件,并且我必须在每一列中添加引号("),我已经添加了这些引号,但是这次,CSV编程在comment列中添加了额外的双引号(如果列中有逗号(,))。
我的CSV文件如下:
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
我需要的CSV文件应该是这样的(在comment列中没有双引号):
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"
我的Golang代码如下:
RegContent := regexp.MustCompile(`",""[A-Za-z0-9]`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `","`)
fmt.Println("PLAY: ", newRegexp)
err = ioutil.WriteFile(path, []byte(newRegexp), 0)
if err != nil {
fmt.Println("error: ", err)
}
输出结果如下:
"son likes this video, good job" //(缺少My)
"don't like this video, it may be better" //(缺少I)
英文:
I created a CSV file in Go and I have to add quotation marks(") in every column, I added these but this time, CSV programming adds extra(double) quotation marks in comment column (if there is the comma(,) in column)
My CSV
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
I need CSV like this (there is not double quotation in comment columns"
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"
My Golang Code
RegContent := regexp.MustCompile(`",""[A-Za-z0-9]`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `","`)
fmt.Println("PLAY: ", newRegexp)
err = ioutil.WriteFile(path, []byte(newRegexp), 0)
if err != nil {
fmt.Println("error: ", err)
}
Output
"son likes this video, good job" //(Missing My)
"don't like this video, it may be better" //(Missing I)
答案1
得分: 2
你可以在捕获外部引号之间匹配最后一列,并在替换参数中使用反向引用来恢复该部分:
package main
import (
"fmt"
"regexp"
)
func main() {
CSV_Contents := `
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,","","","2020-12-29 23:06:32","I don't like this video, it may be better"
`
RegContent := regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"$`)
result := RegContent.ReplaceAllString(CSV_Contents, `,$1`)
fmt.Println(result)
}
查看Go演示,输出:
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,","","","2020-12-29 23:06:32","I don't like this video, it may be better"
查看正则表达式演示。详细信息:
(?m)
- 多行模式开启,$
将匹配行尾,"
- 逗号和引号("["^"]*(?:""[^"]*)*")
- 第1组($1
):一个引号,然后是任意零个或多个非引号字符,然后是零个或多个""
序列(如果在评论列中有转义引号,它们将保持不变),然后是零个或多个非引号字符,然后"$
- 一行的末尾是引号。
英文:
You can match the last column while capturing all between outer quotes and use a backreference in the replacement argument to ReplaceAllString
to restore that part:
package main
import (
"fmt"
"regexp"
)
func main() {
CSV_Contents := `
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
`
RegContent := regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"$`)
result := RegContent.ReplaceAllString(CSV_Contents, `,$1`)
fmt.Println(result)
}
See the Go demo, output:
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"
See the regex demo. Details:
(?m)
- multiline mode on,$
will match end of lines,"
- a comma and"
("[^"]*(?:""[^"]*)*")
- Group 1 ($1
): a"
, then any zero or more chars other than a"
, then zero or more sequences of""
(if there are escaped quotes inside comment column, they will be left intact) and then zero or more non-"
chars, and then"$
- a"
at the end of a line.
答案2
得分: 1
你可以使用ReplaceAllStringFunc()来实现所描述的行为。
f := func(s string) string {
return strings.ReplaceAll(s, `""`, `"`)
}
RegContent := regexp.MustCompile(`",""[^,].+""`)
newRegexp := RegContent.ReplaceAllStringFunc(CSV_Contents, f)
fmt.Println("PLAY: ", newRegexp)
链接:https://play.golang.org/p/1NqTyN1hs1J
另一种方法是使用ReplaceAllString():
RegContent := regexp.MustCompile(`,""([^,].+)""`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `,"$1"`)
fmt.Println("PLAY: ", newRegexp)
链接:https://play.golang.org/p/tY8zGWTbLLB
英文:
You can get described behavior with ReplaceAllStringFunc()
f := func(s string) string {
return strings.ReplaceAll(s, `""`, `"`)
}
RegContent := regexp.MustCompile(`",""[^,].+""`)
newRegexp := RegContent.ReplaceAllStringFunc(CSV_Contents, f)
fmt.Println("PLAY: ", newRegexp)
https://play.golang.org/p/1NqTyN1hs1J
And alternative with ReplaceAllString():
RegContent := regexp.MustCompile(`,""([^,].+)""`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `,"$1"`)
fmt.Println("PLAY: ", newRegexp)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论