英文:
Way to check for duplicates before writing into a file?
问题
所以我写了一个小脚本,它以文本文件作为输入,逐行读取并尝试将其验证为电子邮件。如果验证通过,它将该行写入一个新的('clean')文件;如果验证不通过,它会去除空格并再次尝试验证。现在的问题是,我的脚本可能会将重复的电子邮件写入输出文件中。在写入之前,我应该如何检查输出文件中是否存在重复项?
以下是相关的代码:
// 创建读取和写入缓冲区
scanner := bufio.NewScanner(r)
writer := bufio.NewWriter(w)
// 逐行扫描文本文件
for scanner.Scan() {
email := scanner.Text()
// 验证每个电子邮件
if !correctEmail.MatchString(email) {
// 如果验证未通过,去除空格并将电子邮件转为小写
email = strings.Replace(email, " ", "", -1)
// 清理后再次验证电子邮件
if !correctEmail.MatchString(email) {
// 如果验证未通过,忽略此电子邮件
continue
} else {
// 如果验证通过,将清理后的电子邮件写入文件
_, err = writer.WriteString(email + "\r\n")
if err != nil {
return err
}
}
} else {
// 如果验证通过,将电子邮件写入文件
_, err = writer.WriteString(email + "\r\n")
if err != nil {
return err
}
}
}
err = writer.Flush()
if err != nil {
return err
}
英文:
So I wrote a small script that takes text files as an input, reads every line and tries to validate it as an email. If it passes, it writes the line into a new ('clean') file, if it doesn't pass, it strips it of spaces and tries to validate it again. Now, if it passes this time, it writes the line into a new file and if it fails, it ignores the line.
Thing is, such as it is, my script may write duplicate emails into the output files. How should I go around that and check for duplicates present in the output file before writing?
Here's the relevant code:
<!-- language: lang-golang -->
// create reading and writing buffers
scanner := bufio.NewScanner(r)
writer := bufio.NewWriter(w)
for scanner.Scan() {
email := scanner.Text()
// validate each email
if !correctEmail.MatchString(email) {
// if validation didn't pass, strip and lowercase the email and store it
email = strings.Replace(email, " ", "", -1)
// validate the email again after cleaning
if !correctEmail.MatchString(email) {
// if validation didn't pass, ignore this email
continue
} else {
// if validation passed, write clean email into file
_, err = writer.WriteString(email + "\r\n")
if err != nil {
return err
}
}
} else {
// if validation passed, write the email into file
_, err = writer.WriteString(email + "\r\n")
if err != nil {
return err
}
}
}
err = writer.Flush()
if err != nil {
return err
}
答案1
得分: 2
创建一个实现writer
接口的类型,然后创建一个自定义的WriteString
函数。
在WriteString
函数内部,打开存储电子邮件的文件,遍历每封电子邮件并保存新的电子邮件。
英文:
Create a type that implements writer
then create a custom WriteString
Inside WriteString
open the file where you store your emails, iterate over each email and save the new emails.
答案2
得分: 1
你可以像这样使用Go内置的map作为集合:
package main
import (
"fmt"
)
var emailSet map[string]bool = make(map[string]bool)
func emailExists(email string) bool {
_, ok := emailSet[email]
return ok
}
func addEmail(email string) {
emailSet[email] = true
}
func main() {
emails := []string{
"duplicated@golang.org",
"abc@golang.org",
"stackoverflow@golang.org",
"duplicated@golang.org", // <- 重复的!
}
for _, email := range emails {
if !emailExists(email) {
fmt.Println(email)
addEmail(email)
}
}
}
以下是输出结果:
duplicated@golang.org
abc@golang.org
stackoverflow@golang.org
你可以在Go Playground上尝试相同的代码。
英文:
You may use a Go built-in map as a set like this:
<!-- language: lang-golang -->
package main
import (
"fmt"
)
var emailSet map[string]bool = make(map[string]bool)
func emailExists(email string) bool {
_, ok := emailSet[email]
return ok
}
func addEmail(email string) {
emailSet[email] = true
}
func main() {
emails := []string{
"duplicated@golang.org",
"abc@golang.org",
"stackoverflow@golang.org",
"duplicated@golang.org", // <- Duplicated!
}
for _, email := range emails {
if !emailExists(email) {
fmt.Println(email)
addEmail(email)
}
}
}
Here is the output:
duplicated@golang.org
abc@golang.org
stackoverflow@golang.org
You may try the same code at <kbd>The Go Playground</kbd>.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论