英文:
How do I skip the filesystem cache when reading a file in Golang?
问题
假设文件Foo.txt
的内容如下:
Foo Bar Bar Foo
考虑以下简短的程序:
package main
import "syscall"
import "fmt"
func main() {
fd, err := syscall.Open("Foo.txt", syscall.O_RDONLY, 0)
if err != nil {
fmt.Println("打开文件失败:", err)
}
data := make([]byte, 100)
_, err = syscall.Read(fd, data)
if err != nil {
fmt.Println("读取文件失败:", err)
}
syscall.Close(fd)
}
当我们运行上述程序时,没有出现错误,这是正确的行为。
现在,我将syscall.Open
行修改为以下内容:
fd, err := syscall.Open("Foo.txt", syscall.O_RDONLY | syscall.O_SYNC | syscall.O_DIRECT, 0)
当我再次运行程序时,出现了以下(不希望的)输出:
读取文件失败:无效的参数
我应该如何正确地传递syscall.O_SYNC
和syscall.O_DIRECT
标志,以便根据open
手册页面中的规定跳过文件系统缓存?
请注意,我直接使用syscall
文件接口而不是os
文件接口,因为我找不到一种将这些标志传递给os
提供的函数的方法,但如果它们能正确地在读取时禁用文件系统缓存,我也可以接受使用os
的解决方案。
还请注意,我在Ubuntu 14.04
上运行,文件系统为ext4
。
更新:我尝试在下面的代码中使用了@Nick Craig-Wood的包。
package main
import "io"
import "github.com/ncw/directio"
import "os"
import "fmt"
func main() {
in, err := directio.OpenFile("Foo.txt", os.O_RDONLY, 0666)
if err != nil {
fmt.Println("打开文件错误:", err)
}
block := directio.AlignedBlock(directio.BlockSize)
_, err = io.ReadFull(in, block)
if err != nil {
fmt.Println("读取文件错误:", err)
}
}
输出如下:
读取文件错误:意外的文件结尾
英文:
Assume that the contents of the file Foo.txt
are as follows.
Foo Bar Bar Foo
Consider the following short program.
package main
import "syscall"
import "fmt"
func main() {
fd, err := syscall.Open("Foo.txt", syscall.O_RDONLY, 0)
if err != nil {
fmt.Println("Failed on open: ", err)
}
data := make([]byte, 100)
_, err = syscall.Read(fd, data)
if err != nil {
fmt.Println("Failed on read: ", err)
}
syscall.Close(fd)
}
When we run the program above, we get no errors, which is correct behavior.
Now, I modify the syscall.Open
line to be the following.
fd, err := syscall.Open("Foo.txt", syscall.O_RDONLY | syscall.O_SYNC | syscall.O_DIRECT, 0)
When I run the program again, I get the following (undesirable) output.
Failed on read: invalid argument
How can I correctly pass the flags syscall.O_SYNC
and syscall.O_DIRECT
as specified by the the open
man page for skipping the filesystem cache?
Note that I am using the syscall
file interface directly instead of the os
file interface because I could not find a way to pass those flags into the functions provided by os
, but I am open to solutions that use os
provided that they work correctly to disable the filesystem cache on reads.
Note also that I am running on Ubuntu 14.04
with ext4
as my filesystem.
Update: I tried to use @Nick Craig-Wood's package in the code below.
package main
import "io"
import "github.com/ncw/directio"
import "os"
import "fmt"
func main() {
in, err := directio.OpenFile("Foo.txt", os.O_RDONLY, 0666)
if err != nil {
fmt.Println("Error on open: ", err)
}
block := directio.AlignedBlock(directio.BlockSize)
_, err = io.ReadFull(in, block)
if err != nil {
fmt.Println("Error on read: ", err)
}
}
The output is the following
Error on read: unexpected EOF
答案1
得分: 6
你可能会喜欢我为了这个目的而制作的directio包。
从网站上来看:
这是一个用于Go语言的库,可以在所有支持的Go操作系统中(除了openbsd和plan9)启用直接IO。
直接IO在不使用操作系统缓冲数据的情况下进行磁盘的读写操作。当你需要读写大量数据时,而又不想将数据填满操作系统缓存时,这是非常有用的。
请参阅此处的包文档:
http://go.pkgdoc.org/github.com/ncw/directio
英文:
You may enjoy my directio package which I made for exactly this purpose.
From the site
This is library for the Go language to enable use of Direct IO under all supported OSes of Go (except openbsd and plan9).
Direct IO does IO to and from disk without buffering data in the OS. It is useful when you are reading or writing lots of data you don't want to fill the OS cache up with.
See here for package docs
答案2
得分: 2
从open
手册的NOTES部分可以看到:
> O_DIRECT标志可能对用户空间缓冲区的长度和地址以及I/O的文件偏移 impose alignment restrictions。在Linux中,对齐限制因文件系统和内核版本而异,也可能完全不存在。
因此,您可能会遇到对齐问题,无论是内存还是文件偏移,或者您的缓冲区大小可能是“错误”的。对于对齐和大小应该是什么并不明显。手册继续说道:
> 然而,目前没有一个文件系统独立的接口供应用程序发现给定文件或文件系统的这些限制。
甚至连Linus也以他惯常的轻描淡写的方式发表了自己的看法:
> “关于O_DIRECT让我一直感到困扰的是,整个接口只是愚蠢,可能是由某个严重受到精神控制物质影响的疯狂猴子设计的。” ——Linus
祝你好运!
附注:猜测一下,为什么不读取512字节?
英文:
From the open
man page, under NOTES:
> The O_DIRECT flag may impose alignment restrictions on the length and address of user-space buffers and the file offset of I/Os. In Linux alignment restrictions vary by file system and kernel version and might be absent entirely.
So you could have alignment issues, of either the memory or the file offset, or your buffer size could be "wrong". What the alignments and sizes should be is not obvious. The man page continues:
> However there is currently no file system-independent interface for an application to discover these restrictions for a given file or file system.
And even Linus weighs in, in his usual understated manner:
> "The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances." —Linus
Good luck!
p.s. Stab in the dark: why not read 512 bytes?
答案3
得分: 0
你可以尝试使用fadvice和madvise,但不能保证效果。这两个函数在处理较大的文件/数据时可能更有效,因为:
> 部分页面被有意保留,因为保留所需的内存比丢弃不需要的内存更好。
查看Linux源代码,了解哪些操作会起作用,哪些不会。例如,POSIX_FADV_NOREUSE就不会起作用。
http://lxr.free-electrons.com/source/mm/fadvise.c#L62
http://lxr.free-electrons.com/source/mm/madvise.c
package main
import "fmt"
import "os"
import "syscall"
import "golang.org/x/sys/unix"
func main() {
advise := false
if len(os.Args) > 1 && os.Args[1] == "-x" {
fmt.Println("设置文件建议")
advise = true
}
data := make([]byte, 100)
handler, err := os.Open("Foo.txt")
if err != nil {
fmt.Println("打开文件失败:", err)
}; defer handler.Close()
if advise {
unix.Fadvise(int(handler.Fd()), 0, 0, 4) // 4 == POSIX_FADV_DONTNEED
}
read, err := handler.Read(data)
if err != nil {
fmt.Println("读取文件失败:", err)
os.Exit(1)
}
if advise {
syscall.Madvise(data, 4) // 4 == MADV_DONTNEED
}
fmt.Printf("读取了 %v 字节\n", read)
}
/usr/bin/time -v ./direct -x
正在计时的命令: "./direct -x"
用户时间(秒):0.00
系统时间(秒):0.00
该作业所占CPU的百分比:0%
经过的时间(墙上时钟)(h:mm:ss或m:ss):0:00.03
平均共享文本大小(千字节):0
平均非共享数据大小(千字节):0
平均堆栈大小(千字节):0
平均总大小(千字节):0
最大常驻集大小(千字节):1832
平均常驻集大小(千字节):0
主要(需要I/O)页面故障:2
次要(重新获取帧)页面故障:149
自愿上下文切换:2
非自愿上下文切换:2
交换:0
文件系统输入:200
文件系统输出:0
发送的套接字消息:0
接收的套接字消息:0
传递的信号:0
页面大小(字节):4096
退出状态:0
英文:
you can try to use fadvice and madvice, but there is no guarantee. both will work more probably with larger files/data, because:
> Partial pages are deliberately preserved on the expectation that it is better to preserve needed memory than to discard unneeded memory.
see the linux source code, what will do something and what not. POSIX_FADV_NOREUSE for example doesn't do anything.
http://lxr.free-electrons.com/source/mm/fadvise.c#L62
http://lxr.free-electrons.com/source/mm/madvise.c
package main
import "fmt"
import "os"
import "syscall"
import "golang.org/x/sys/unix"
func main() {
advise := false
if len(os.Args) > 1 && os.Args[1] == "-x" {
fmt.Println("setting file advise")
advise =true
}
data := make([]byte, 100)
handler, err := os.Open("Foo.txt")
if err != nil {
fmt.Println("Failed on open: ", err)
}; defer handler.Close()
if advise {
unix.Fadvise(int(handler.Fd()), 0, 0, 4) // 4 == POSIX_FADV_DONTNEED
}
read, err := handler.Read(data)
if err != nil {
fmt.Println("Failed on read: ", err)
os.Exit(1)
}
if advise {
syscall.Madvise(data, 4) // 4 == MADV_DONTNEED
}
fmt.Printf("read %v bytes\n", read)
}
/usr/bin/time -v ./direct -x
Command being timed: "./direct -x"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1832
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 149
Voluntary context switches: 2
Involuntary context switches: 2
Swaps: 0
File system inputs: 200
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论