在写入文件之前检查重复项的方法是什么?

huangapple go评论133阅读模式
英文:

Way to check for duplicates before writing into a file?

问题

所以我写了一个小脚本,它以文本文件作为输入,逐行读取并尝试将其验证为电子邮件。如果验证通过,它将该行写入一个新的('clean')文件;如果验证不通过,它会去除空格并再次尝试验证。现在的问题是,我的脚本可能会将重复的电子邮件写入输出文件中。在写入之前,我应该如何检查输出文件中是否存在重复项?

以下是相关的代码:

// 创建读取和写入缓冲区
scanner := bufio.NewScanner(r)
writer := bufio.NewWriter(w)

// 逐行扫描文本文件
for scanner.Scan() {
    email := scanner.Text()

    // 验证每个电子邮件
    if !correctEmail.MatchString(email) {
        // 如果验证未通过,去除空格并将电子邮件转为小写
        email = strings.Replace(email, " ", "", -1)
        // 清理后再次验证电子邮件
        if !correctEmail.MatchString(email) {
            // 如果验证未通过,忽略此电子邮件
            continue
        } else {
            // 如果验证通过,将清理后的电子邮件写入文件
            _, err = writer.WriteString(email + "\r\n")
            if err != nil {
                return err
            }
        }

    } else {
        // 如果验证通过,将电子邮件写入文件
        _, err = writer.WriteString(email + "\r\n")
        if err != nil {
            return err
        }
    }

}

err = writer.Flush()
if err != nil {
    return err
}
英文:

So I wrote a small script that takes text files as an input, reads every line and tries to validate it as an email. If it passes, it writes the line into a new ('clean') file, if it doesn't pass, it strips it of spaces and tries to validate it again. Now, if it passes this time, it writes the line into a new file and if it fails, it ignores the line.

Thing is, such as it is, my script may write duplicate emails into the output files. How should I go around that and check for duplicates present in the output file before writing?

Here's the relevant code:

<!-- language: lang-golang -->

// create reading and writing buffers
	scanner := bufio.NewScanner(r)
	writer := bufio.NewWriter(w)

	for scanner.Scan() {
		email := scanner.Text()

		// validate each email
		if !correctEmail.MatchString(email) {
			// if validation didn&#39;t pass, strip and lowercase the email and store it
			email = strings.Replace(email, &quot; &quot;, &quot;&quot;, -1)
			// validate the email again after cleaning
			if !correctEmail.MatchString(email) {
				// if validation didn&#39;t pass, ignore this email
				continue
			} else {
				// if validation passed, write clean email into file
				_, err = writer.WriteString(email + &quot;\r\n&quot;)
				if err != nil {
					return err
				}
			}

		} else {
			// if validation passed, write the email into file
			_, err = writer.WriteString(email + &quot;\r\n&quot;)
			if err != nil {
				return err
			}
		}

	}

	err = writer.Flush()
	if err != nil {
		return err
	}

答案1

得分: 2

创建一个实现writer接口的类型,然后创建一个自定义的WriteString函数。

WriteString函数内部,打开存储电子邮件的文件,遍历每封电子邮件并保存新的电子邮件。

英文:

Create a type that implements writer then create a custom WriteString

Inside WriteString open the file where you store your emails, iterate over each email and save the new emails.

答案2

得分: 1

你可以像这样使用Go内置的map作为集合:

package main

import (
	"fmt"
)

var emailSet map[string]bool = make(map[string]bool)

func emailExists(email string) bool {
	_, ok := emailSet[email]
	return ok
}

func addEmail(email string) {
	emailSet[email] = true
}

func main() {
	emails := []string{
		"duplicated@golang.org",
		"abc@golang.org",
		"stackoverflow@golang.org",
		"duplicated@golang.org", // <- 重复的!
	}
	for _, email := range emails {
		if !emailExists(email) {
			fmt.Println(email)
			addEmail(email)
		}
	}
}

以下是输出结果:

duplicated@golang.org
abc@golang.org
stackoverflow@golang.org

你可以在Go Playground上尝试相同的代码。

英文:

You may use a Go built-in map as a set like this:

<!-- language: lang-golang -->

package main

import (
	&quot;fmt&quot;
)

var emailSet map[string]bool = make(map[string]bool)

func emailExists(email string) bool {
	_, ok := emailSet[email]
	return ok
}

func addEmail(email string) {
	emailSet[email] = true
}

func main() {
	emails := []string{
		&quot;duplicated@golang.org&quot;,
		&quot;abc@golang.org&quot;,
		&quot;stackoverflow@golang.org&quot;,
		&quot;duplicated@golang.org&quot;, // &lt;- Duplicated!
	}
	for _, email := range emails {
		if !emailExists(email) {
			fmt.Println(email)
			addEmail(email)
		}
	}
}

Here is the output:

duplicated@golang.org
abc@golang.org
stackoverflow@golang.org

You may try the same code at <kbd>The Go Playground</kbd>.

huangapple
  • 本文由 发表于 2016年8月6日 22:21:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/38805269.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定