问题

我写了一个小的命令行工具，需要循环迭代一个庞大的文件服务器。逻辑非常简单，但耗费太多时间。我发现问题出在读取二进制文件到缓冲区上。我想保持实现简单，因为这是C++，其他人也需要理解这段代码。

std::ifstream input(foundFile.c_str(), std::ios::binary);
std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(input), {});

最后，我猜我必须重构以进行分块读取。但总的来说，为什么使用这种方式读取二进制文件如此缓慢？

完整的源代码：https://gitlab.com/Onnebrink/cltools/-/blob/main/src/dupfind/dupfind.cpp

英文:

i wrote a small commandline tool i need to loop and iterate a huge fileserver.
The logic is really simple. But it needs to much time. And i found the problem
is to read binary files into a buffer. I want to hold the implementation easy
because its c++ and some others have to understand the code too.

std::ifstream input( foundFile.c_str(), std::ios::binary );
std::vector&lt;unsigned char&gt; buffer(std::istreambuf_iterator&lt;char&gt;(input), {});

At the end i guess i have to refactor to chunk reading. But in general why
is it so slow about this way to readin a file binary?

complete source:
https://gitlab.com/Onnebrink/cltools/-/blob/main/src/dupfind/dupfind.cpp

答案1

得分: 0

现在速度快多了。我猜我需要稍微调整一下缓冲区大小。也许 4096 字节太少了。但我对文件大小的平均情况没有很好的了解，它会被找到。也许我应该根据找到的文件大小更加自适应地调整它。

unsigned long long calcHash(string &foundFile) {
  const int bufferSize = 4096;
  unsigned long long hashValue = 0xeba29ce484222325ULL;
  unsigned long long magicPrime = 0xad3760fd485d7f11ULL;
  ifstream inFile(foundFile.c_str(), std::ios::binary);
  vector<char> buffer(bufferSize);
  while (!inFile.eof()) {
    inFile.read(buffer.data(), bufferSize);
    for (streamsize i = 0; i < inFile.gcount(); i++)
      hashValue ^= buffer[i], hashValue *= magicPrime;
  }
  return hashValue;
}

英文:

Now, its much faster. I guess i have to play a little bit with bufferSize.
Perhaps 4096byte is to little. But I dont have an good overview about the average cases of the fileSizes it will be found. Perhaps i should make it more adaptive depending on founded filesize

  unsigned long long calcHash(string &amp;foundFile) {
  const int bufferSize=4096;
  unsigned long long hashValue = 0xeba29ce484222325ULL;
  unsigned long long magicPrime = 0xad3760fd485d7f11ULL;
  ifstream inFile(foundFile.c_str(), std::ios::binary);
  vector&lt;char&gt; buffer(bufferSize);
  while (!inFile.eof()) {
    inFile.read(buffer.data(), bufferSize);
    for (streamsize i = 0; i &lt; inFile.gcount(); i++)
      hashValue ^= buffer[i], hashValue *= magicPrime;
  }
  return hashValue;
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在C++中优化将二进制文件数据读取到缓冲区的操作。

问题

答案1

Cuda 使用模板类 / 将 Lambda 传递给非类函数

C++中是否有类似模板化的静态断言的东西？

如何将一个长整型转换为其32位二进制表示？

如何将两个char指针连接成一个？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。