为什么在处理词频时有些单词会打印两次

huangapple go评论80阅读模式
英文:

Why are some words printed twice when working with word frequency

问题

output:

the : 2230 times

of : 1254 times

to : 1177 times

a : 1121 times

and : 1109 times

said : 680 times

it : 665 times

was : 605 times

in : 590 times

he : 546 times

that : 520 times

you : 495 times

I : 428 times

on : 349 times

Arthur : 332 times

his : 324 times

Ford : 314 times

The : 307 times

at : 306 times

for : 284 times

is : 281 times

with : 273 times

had : 252 times

He : 242 times

this : 220 times

as : 207 times

Zaphod : 206 times

be : 188 times

all : 186 times

him : 182 times

"the" is printed twice. Also "could not open file" is printed at the top even though the file was open and its content is stored in the map.

英文:

I read some words from a file and print the 30 most frequent words but some words are printed

twice as you can see in the output.

#include <iostream>
#include <vector>
#include <map>
#include <iterator>
#include <fstream>
using namespace std;

int main(){
  
  fstream fs, output;
  fs.open("/Users/brah79/Downloads/skola/c++/inlämningsuppgifter/labb4/L4_wc/hitchhikersguide.txt");
  output.open("/Users/brah79/Downloads/skola/c++/inlämningsuppgifter/labb4/labb4/output.txt");
  if(!fs.is_open() || !output.is_open()){
    cout << "could not open file" << endl; 
  }

  map <string, int> mp; 
  string word; 
  while(fs >> word){

    for(int i = 0; i < word.length(); i++){
        if(!isalpha(word[i])){
        word.erase(i--, 1);
      }
    }
    if(word.empty()){
        continue; 
    }

  
    mp[word]++; 
  }
  vector<pair<int, string>> v;
  v.reserve(mp.size());

  for (const auto& p : mp){
    v.emplace_back(p.second, p.first);
  }

  sort(v.rbegin(), v.rend()); 

  cout << "Theese are the 30 most frequent words: " << endl; 
  for(int i = 0; i < 30; i++){
      cout << v[i].second << " : " << v[i].first << " times" << endl;
  }


  output << "Theese are the 30 most frequent words: " << endl; 
  for(int i = 0; i < 30; i++){
      cout << v[i].second << " : " << v[i].first << " times" << endl;
  }
 

  return 0; 
}

output:

the : 2230 times !!!

of : 1254 times

to : 1177 times

a : 1121 times

and : 1109 times

said : 680 times

it : 665 times

was : 605 times

in : 590 times

he : 546 times

that : 520 times

you : 495 times

I : 428 times

on : 349 times

Arthur : 332 times

his : 324 times

Ford : 314 times

The : 307 times !!!

at : 306 times

for : 284 times

is : 281 times

with : 273 times

had : 252 times

He : 242 times

this : 220 times

as : 207 times

Zaphod : 206 times

be : 188 times

all : 186 times

him : 182 times

"the" is printed twice. Also "could not open file" is printed at the top even

though the file was open and it's content is stored in the map.

答案1

得分: 2

因为你以区分大小写的方式编写了你的程序。

特别是,Thethe 被认为是不同的,因此它们具有不同的频率。例如,the 出现了2230次,而 The 出现了307次。

英文:

Because you've written your program in an case-sensitive manner.

In particular, The and the are considered different from each other and so have different frequencies. For example, the is 2230 times while The is 307 times.

huangapple
  • 本文由 发表于 2023年3月10日 00:06:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75687170.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定