在C++中优化将二进制文件数据读取到缓冲区的操作。

huangapple go评论66阅读模式
英文:

Optimize readin binary file data to buffer in C++

问题

我写了一个小的命令行工具,需要循环迭代一个庞大的文件服务器。逻辑非常简单,但耗费太多时间。我发现问题出在读取二进制文件到缓冲区上。我想保持实现简单,因为这是C++,其他人也需要理解这段代码。

std::ifstream input(foundFile.c_str(), std::ios::binary);
std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(input), {});

最后,我猜我必须重构以进行分块读取。但总的来说,为什么使用这种方式读取二进制文件如此缓慢?

完整的源代码:https://gitlab.com/Onnebrink/cltools/-/blob/main/src/dupfind/dupfind.cpp

英文:

i wrote a small commandline tool i need to loop and iterate a huge fileserver.
The logic is really simple. But it needs to much time. And i found the problem
is to read binary files into a buffer. I want to hold the implementation easy
because its c++ and some others have to understand the code too.

std::ifstream input( foundFile.c_str(), std::ios::binary );
std::vector&lt;unsigned char&gt; buffer(std::istreambuf_iterator&lt;char&gt;(input), {});

At the end i guess i have to refactor to chunk reading. But in general why
is it so slow about this way to readin a file binary?

complete source:
https://gitlab.com/Onnebrink/cltools/-/blob/main/src/dupfind/dupfind.cpp

答案1

得分: 0

现在速度快多了。我猜我需要稍微调整一下缓冲区大小。也许 4096 字节太少了。但我对文件大小的平均情况没有很好的了解,它会被找到。也许我应该根据找到的文件大小更加自适应地调整它。

unsigned long long calcHash(string &foundFile) {
  const int bufferSize = 4096;
  unsigned long long hashValue = 0xeba29ce484222325ULL;
  unsigned long long magicPrime = 0xad3760fd485d7f11ULL;
  ifstream inFile(foundFile.c_str(), std::ios::binary);
  vector<char> buffer(bufferSize);
  while (!inFile.eof()) {
    inFile.read(buffer.data(), bufferSize);
    for (streamsize i = 0; i < inFile.gcount(); i++)
      hashValue ^= buffer[i], hashValue *= magicPrime;
  }
  return hashValue;
}
英文:

Now, its much faster. I guess i have to play a little bit with bufferSize.
Perhaps 4096byte is to little. But I dont have an good overview about the average cases of the fileSizes it will be found. Perhaps i should make it more adaptive depending on founded filesize

  unsigned long long calcHash(string &amp;foundFile) {
  const int bufferSize=4096;
  unsigned long long hashValue = 0xeba29ce484222325ULL;
  unsigned long long magicPrime = 0xad3760fd485d7f11ULL;
  ifstream inFile(foundFile.c_str(), std::ios::binary);
  vector&lt;char&gt; buffer(bufferSize);
  while (!inFile.eof()) {
    inFile.read(buffer.data(), bufferSize);
    for (streamsize i = 0; i &lt; inFile.gcount(); i++)
      hashValue ^= buffer[i], hashValue *= magicPrime;
  }
  return hashValue;
}

huangapple
  • 本文由 发表于 2023年2月9日 00:07:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388564.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定