对于文件中的每个单词,找出当前单词是否出现超过一次。

huangapple go评论76阅读模式
英文:

For each Word in a File, Find If current Word is Present More than Once

问题

我对Golang非常陌生,我在尝试查找并打印文件中包含特定相同的所有行时遇到了一些问题。

我的文件结构如下:

索引 文本
索引 文本
     .
     .
     .
索引 文本

其中索引始终是6位数,而文本始终是16位数

> 我需要查找并打印所有包含相同文本值的行。

这是我到目前为止尝试的代码:

func main() {
 
    //用于存储相同文本的数组
    found := make([]string, 6)

    r, _ := os.Open("store.txt")
    scanner := bufio.NewScanner(r)
    //按单词分割
    scanner.Split(bufio.ScanWords)
    //遍历文件中的所有单词
    for scanner.Scan() {
	    line := scanner.Text()
        //如果当前行是16位数
	    if(utf8.RuneCountInString(line) == 16){
	       currLine := line
            //在相同文件中搜索所有16位数的文本
		    for scanner.Scan(){
			    searchLine := scanner.Text()
                //如果找到相同的文本
			    if(utf8.RuneCountInString(searchLine) == 16){
                    //将其添加到found数组中
			 	    if(currLine == searchLine){
					    found = append(found, currLine)
				    }
			    }
		    }
	    }
    }
    //打印found数组
    fmt.Println(found)
    //关闭文件
    r.Close()
}

然后,我想使用found来打印与当前的found[i-element]匹配的所有

上面的代码只适用于第一步。
例如,如果在我的文件中,第一行得到1234567890123456(例如从索引1开始),然后只检查并添加一次,它不会循环处理所有行(对于剩余的n-1个单词)。

  • 我该如何解决第一个问题?

  • 您认为将重复的文本添加到一个数组中,然后根据它打印匹配的行是一个不好的主意吗?

提前致谢。

英文:

I'm very new to Golang and I'm having some issues on trying to find and print all the lines in a file which contain a certain same value.

My file is structured like the following:

index text
index text
     .
     .
     .
index text

Where index is ALWAYS 6 digits long and text is ALWAYS 16 digits long.

> I need to find and print all the lines which contain the same text value.

That's what I tried so far:

func main() {
 
    //Array to contain common texts
    found := make([]string, 6)

    r, _ := os.Open("store.txt")
    scanner := bufio.NewScanner(r)
    //Splits in words
    scanner.Split(bufio.ScanWords)
    //Loop over all Words in the file
    for scanner.Scan() {
	    line := scanner.Text()
        //If the current line is 16 digits long
	    if(utf8.RuneCountInString(line) == 16){
	       currLine := line
            //Search in the same files all the 16 digits long texts and
		    for scanner.Scan(){
			    searchLine := scanner.Text()
                //If a same text is found
			    if(utf8.RuneCountInString(searchLine) == 16){
                    //Append it to found array
			 	    if(currLine == searchLine){
					    found = append(found, currLine)
				    }
			    }
		    }
	    }
    }
    //Print found Array
    fmt.Println(found)
    //Close File
    r.Close()
}

Then, I would like to use found to print all the lines which match the current found[i-element].

The code above works only for the very first step.
For instance, If in my file, at the very first line it gets 1234567890123456 (e.g. from index 1) then it checks and appends only one time, it does not loop for all the lines (for the remaining n-1 words).

  • How can I fix the first issue?

  • Do you think adding the duplicate texts in an Array and then print the matching lines based on it is a bad idea?

Thanks in advance.

答案1

得分: 1

第一个问题是由于您在读取文件和检查重复时使用了相同的流,所以当内部循环到达文件底部时,外部循环检查是否还有更多内容可扫描,但它发现了EOF并退出。

解决问题的最简单方法是创建一个数组,将所有第一次找到的文本放入其中,当文本值已经存在时,只需打印出重复项。类似于以下代码:

duplicates := make([]string, 0)
for scanner.Scan() {
    line := scanner.Text()
    text := line[6:]
    // 进行检查
    // 如果所有的控制都没问题
    if !contains(duplicates, text) {
        duplicates = append(duplicates, text)
    } else {
        // 打印重复项
    }
}

下面是contains函数的实现:

func contains(s []string, e string) bool {
    for _, a := range s {
        if a == e {
            return true
        }
    }
    return false
}

希望对您有所帮助!

英文:

The first issue is caused because you are using the same stream to read the file and check duplicate so when the inner for reach the bottom of the file finish, then the outer for check if there is something more to scan but it find the EOF and exit.
The easiest way to solve your problem is creating an array where you put all the text that you find for the first time and when the text value are already present just print the duplicate. Something like this:

duplicates := make([]string,0)
for scanner.Scan() {
    line := scanner.Text()
    text := line[6:]
    //Do your check
    //if all your control are ok
    if ! contains(duplicates, text) {
        duplicates = append(duplicates, text)
    } else {
        //Print the duplicates
    }

And here there is the contains implementation

func contains(s []string, e string) bool {
    for _, a := range s {
        if a == e {
            return true
        }
    }
    return false
}

huangapple
  • 本文由 发表于 2016年10月28日 20:46:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/40305433.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定