英文:
Why does `File::read_to_end` get slower the larger the buffer capacity?
问题
通知: 截至2023年4月23日,此问题已在rust-lang/rust:master
上修复。很快你将能够在不担心这些问题的情况下使用File::read_to_end
。
我正在解决一个非常具体的问题,需要我读取数十万个文件,这些文件的大小从几字节到几百兆字节不等。由于大部分操作都涉及枚举文件并将数据从磁盘移动,我决定重复使用Vec
缓冲区进行文件读取,希望能避免一些内存管理问题。
这就是当我遇到的问题:file.read_to_end(&mut buffer)?
的速度会随着缓冲区容量的增加而逐渐变慢。首先读取一个300MB的文件,然后读取一千个1KB的文件比反过来要慢得多(只要不截断缓冲区)。
令人困惑的是,如果我将文件包装在Take
中或使用read_exact()
,就不会出现速度下降。
有谁知道这是怎么回事吗?有可能它在每次调用时(重新)初始化整个缓冲区吗?这是一些Windows特有的怪癖吗?在处理这种情况时,你会推荐使用哪些(基于Windows的)性能分析工具?
以下是一个简单的重现,演示了这些方法之间的性能差异(在这台机器上是50倍以上),忽略磁盘速度:
use std::io::Read;
use std::fs::File;
// 对于较小的缓冲区,这些方法之间基本没有区别...
// const BUFFER_SIZE: usize = 2 * 1024;
// ...但是Vec越大,差距就越大。
// 为了简单起见,让我们假设这是一个硬性上限。
const BUFFER_SIZE: usize = 300 * 1024 * 1024;
fn naive() {
let mut buffer = Vec::with_capacity(BUFFER_SIZE);
for _ in 0..100 {
let mut file = File::open("some_1kb_file.txt").expect("opening file");
let metadata = file.metadata().expect("reading metadata");
let len = metadata.len();
assert!(len <= BUFFER_SIZE as u64);
buffer.clear();
file.read_to_end(&mut buffer).expect("reading file");
// 对缓冲区进行操作
let check = buffer.iter().fold(0usize, |acc, x| acc.wrapping_add(*x as usize));
println!("length: {}, check: {}", len, check);
}
}
fn take() {
let mut buffer = Vec::with_capacity(BUFFER_SIZE);
for _ in 0..100 {
let file = File::open("some_1kb_file.txt").expect("opening file");
let metadata = file.metadata().expect("reading metadata");
let len = metadata.len();
assert!(len <= BUFFER_SIZE as u64);
buffer.clear();
file.take(len).read_to_end(&mut buffer).expect("reading file");
// 这也像直接的`read_to_end`一样表现出明显的减速:
// file.take(BUFFER_SIZE as u64).read_to_end(&mut buffer).expect("reading file");
// 对缓冲区进行操作
let check = buffer.iter().fold(0usize, |acc, x| acc.wrapping_add(*x as usize));
println!("length: {}, check: {}", len, check);
}
}
fn exact() {
let mut buffer = vec![0u8; BUFFER_SIZE];
for _ in 0..100 {
let mut file = File::open("some_1kb_file.txt").expect("opening file");
let metadata = file.metadata().expect("reading metadata");
let len = metadata.len() as usize;
assert!(len <= BUFFER_SIZE);
// 安全性:通过`vec!`初始化,并通过`assert!`在容量内
unsafe { buffer.set_len(len); }
file.read_exact(&mut buffer[0..len]).expect("reading file");
// 对缓冲区进行操作
let check = buffer.iter().fold(0usize, |acc, x| acc.wrapping_add(*x as usize));
println!("length: {}, check: {}", len, check);
}
}
fn main() {
let args: Vec<String> = std::env::args().collect();
if args.len() < 2 {
println!("usage: {} <method>", args[0]);
return;
}
match args[1].as_str() {
"naive" => naive(),
"take" => take(),
"exact" => exact(),
_ => println!("Unknown method: {}", args[1]),
}
}
已尝试了多种--release
模式、LTO
甚至+crt-static
的组合,没有显著差异。
英文:
NOTICE: As of 2023-04-23, a fix for this landed on rust-lang/rust:master
. You'll soon be able to use File::read_to_end
without these worries.
I was working on a pretty specific problem that required me to read hundreds of thousands of files ranging from a few bytes to a few hundred megabytes. As the bulk of the operation consisted in enumerating files and moving data from disk, I resorted to reusing Vec
buffers for the file reading in hopes of avoiding some of that memory management.
That's when I hit the unexpected: file.read_to_end(&mut buffer)?
gets progressively slower the larger the buffer's capacity is. It's a lot slower to read a 300MB file first followed by a thousand 1KB files than the other way around (as long as we don't truncate the buffer).
Confusingly, if I wrap the file in a Take
or use read_exact()
, no slowdown happens.
Does anyone know what that's about? Is it possible it (re)initializes the whole buffer every time it's called? Is this some Windows-specific quirk? What (Windows-based) profiling tools would you recommend when tackling something like this?
Here's a simple reproduction that demonstrates a huge (50x+ on this machine) performance difference between those methods, disk speeds disregarded:
use std::io::Read;
use std::fs::File;
// with a smaller buffer, there's basically no difference between the methods...
// const BUFFER_SIZE: usize = 2 * 1024;
// ...but the larger the Vec, the bigger the discrepancy.
// for simplicity's sake, let's assume this is a hard upper limit.
const BUFFER_SIZE: usize = 300 * 1024 * 1024;
fn naive() {
let mut buffer = Vec::with_capacity(BUFFER_SIZE);
for _ in 0..100 {
let mut file = File::open("some_1kb_file.txt").expect("opening file");
let metadata = file.metadata().expect("reading metadata");
let len = metadata.len();
assert!(len <= BUFFER_SIZE as u64);
buffer.clear();
file.read_to_end(&mut buffer).expect("reading file");
// do "stuff" with buffer
let check = buffer.iter().fold(0usize, |acc, x| acc.wrapping_add(*x as usize));
println!("length: {len}, check: {check}");
}
}
fn take() {
let mut buffer = Vec::with_capacity(BUFFER_SIZE);
for _ in 0..100 {
let file = File::open("some_1kb_file.txt").expect("opening file");
let metadata = file.metadata().expect("reading metadata");
let len = metadata.len();
assert!(len <= BUFFER_SIZE as u64);
buffer.clear();
file.take(len).read_to_end(&mut buffer).expect("reading file");
// this also behaves like the straight `read_to_end` with a significant slowdown:
// file.take(BUFFER_SIZE as u64).read_to_end(&mut buffer).expect("reading file");
// do "stuff" with buffer
let check = buffer.iter().fold(0usize, |acc, x| acc.wrapping_add(*x as usize));
println!("length: {len}, check: {check}");
}
}
fn exact() {
let mut buffer = vec![0u8; BUFFER_SIZE];
for _ in 0..100 {
let mut file = File::open("some_1kb_file.txt").expect("opening file");
let metadata = file.metadata().expect("reading metadata");
let len = metadata.len() as usize;
assert!(len <= BUFFER_SIZE);
// SAFETY: initialized by `vec!` and within capacity by `assert!`
unsafe { buffer.set_len(len); }
file.read_exact(&mut buffer[0..len]).expect("reading file");
// do "stuff" with buffer
let check = buffer.iter().fold(0usize, |acc, x| acc.wrapping_add(*x as usize));
println!("length: {len}, check: {check}");
}
}
fn main() {
let args: Vec<String> = std::env::args().collect();
if args.len() < 2 {
println!("usage: {} <method>", args[0]);
return;
}
match args[1].as_str() {
"naive" => naive(),
"take" => take(),
"exact" => exact(),
_ => println!("Unknown method: {}", args[1]),
}
}
Tried in a few combinations of --release
mode, LTO
and even +crt-static
to no significant difference.
答案1
得分: 8
以下是翻译好的部分:
我尝试使用不断增加的take
数值:
// 使用不同的`take`值从10,000,000到300,000,000运行
file.take(take)
.read_to_end(&mut buffer)
.expect("读取文件");
运行时间几乎呈线性关系地随之增加。
使用cargo flamegraph
提供了清晰的图像:NtReadFile
占用95%的时间。
在exact
版本中仅占用10%。换句话说,你的Rust代码没有问题。
Windows文档没有提到缓冲区的长度,但从阅读Rust标准库来看,似乎NtReadFile
获得了Vec
的整个多余容量,并且从基准测试中可以看出,NtReadFile
在缓冲区的每个字节上都执行了某些操作。
我认为在这里最好使用exact
方法。std::fs::read
在读取之前也查询文件的长度,尽管它始终具有正确大小的缓冲区,因为它创建了Vec
。它还仍然使用read_to_end
,以便在长度发生变化时返回更正确的文件。如果你想重复使用Vec
,你需要以其他方式处理。
确保你选择的方法比每次重新创建Vec
更快,我尝试了一下,性能几乎与exact
相同。释放未使用的内存有性能好处,所以是否使程序更快将取决于情况。
你还可以将代码路径分开处理短文件和长文件。
最后,确保你需要整个文件。如果你可以每次使用BufReader
的块进行处理,使用fill_buf
和consume
,你可以完全避免这个问题。
英文:
I tried using take
with progressively higher numbers:
// Run with different values of `take` from 10_000_000 to 300_000_000
file.take(take)
.read_to_end(&mut buffer)
.expect("reading file");
And the runtime scaled with it almost exactly linearly.
Using cargo flamegraph
gives a clear picture: NtReadFile
takes 95% of the time.
It only takes 10% in the exact
version. In other words, your rust code is not at fault.
The Windows docs don't suggest anything with respect to the length of the buffer, but from reading the rust standard library, it does appear that NtReadFile
is given the entire spare capacity of the Vec
, and it's apparent from the benchmark that NtReadFile
is doing something on every byte in the buffer.
I believe the exact
method would be best here. std::fs::read
also queries the length of the file before reading, although it always has a buffer of the right size since it creates the Vec
. It also still uses read_to_end
so that it returns a more correct file even if the length changed in between. If you want to reuse the Vec
, you would need to do this some other way.
Make sure that whatever you choose is faster than recreating the Vec
every time, which I tried a bit and got nearly the same performance as exact
. There's performance benefits to freeing unused memory, so whether it makes your program faster will depend on the situation.
You could also separate the code paths for short and long files.
Finally, ensure you need the entire file. If you can do your processing with chunks of BufReader
at a time, with fill_buf
and consume
, you can avoid this problem entirely.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论