英文:
Optimize readin binary file data to buffer in C++
问题
我写了一个小的命令行工具,需要循环迭代一个庞大的文件服务器。逻辑非常简单,但耗费太多时间。我发现问题出在读取二进制文件到缓冲区上。我想保持实现简单,因为这是C++,其他人也需要理解这段代码。
std::ifstream input(foundFile.c_str(), std::ios::binary);
std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(input), {});
最后,我猜我必须重构以进行分块读取。但总的来说,为什么使用这种方式读取二进制文件如此缓慢?
完整的源代码:https://gitlab.com/Onnebrink/cltools/-/blob/main/src/dupfind/dupfind.cpp
英文:
i wrote a small commandline tool i need to loop and iterate a huge fileserver.
The logic is really simple. But it needs to much time. And i found the problem
is to read binary files into a buffer. I want to hold the implementation easy
because its c++ and some others have to understand the code too.
std::ifstream input( foundFile.c_str(), std::ios::binary );
std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(input), {});
At the end i guess i have to refactor to chunk reading. But in general why
is it so slow about this way to readin a file binary?
complete source:
https://gitlab.com/Onnebrink/cltools/-/blob/main/src/dupfind/dupfind.cpp
答案1
得分: 0
现在速度快多了。我猜我需要稍微调整一下缓冲区大小。也许 4096 字节太少了。但我对文件大小的平均情况没有很好的了解,它会被找到。也许我应该根据找到的文件大小更加自适应地调整它。
unsigned long long calcHash(string &foundFile) {
const int bufferSize = 4096;
unsigned long long hashValue = 0xeba29ce484222325ULL;
unsigned long long magicPrime = 0xad3760fd485d7f11ULL;
ifstream inFile(foundFile.c_str(), std::ios::binary);
vector<char> buffer(bufferSize);
while (!inFile.eof()) {
inFile.read(buffer.data(), bufferSize);
for (streamsize i = 0; i < inFile.gcount(); i++)
hashValue ^= buffer[i], hashValue *= magicPrime;
}
return hashValue;
}
英文:
Now, its much faster. I guess i have to play a little bit with bufferSize.
Perhaps 4096byte is to little. But I dont have an good overview about the average cases of the fileSizes it will be found. Perhaps i should make it more adaptive depending on founded filesize
unsigned long long calcHash(string &foundFile) {
const int bufferSize=4096;
unsigned long long hashValue = 0xeba29ce484222325ULL;
unsigned long long magicPrime = 0xad3760fd485d7f11ULL;
ifstream inFile(foundFile.c_str(), std::ios::binary);
vector<char> buffer(bufferSize);
while (!inFile.eof()) {
inFile.read(buffer.data(), bufferSize);
for (streamsize i = 0; i < inFile.gcount(); i++)
hashValue ^= buffer[i], hashValue *= magicPrime;
}
return hashValue;
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论