英文:
FUSE - avoid calculating size in GetAttr
问题
我正在为一个远程服务实现一个FUSE文件系统。
当用户打开一个文件时,我会进行网络调用以获取文件的内容。根据这里的描述,为了使打开操作正常工作,必须通过GetAttr
报告文件的大小。
为了知道文件的大小,我必须发起一个网络调用,而且由于在执行ls
时会为每个条目调用GetAttr
,我对这个设计有所担忧(如果用户在一个包含许多项目的目录中执行ls
,它将不得不获取所有文件,即使用户并不想打开其中任何一个)。
我该如何解决这个问题?我的想法是:
- 使用一个不依赖于报告大小的较低级别的读取方法?我认为使用
Read
而不是Open
可能有所帮助,但是我无法在没有大小的情况下使其正常工作。 - 如果我能区分
GetAttr
调用是由Open
发起的还是其他调用(包括ls
),我可以只在需要时发起网络调用。
我使用Go和go-fuse,但我认为这并不重要,因为这是一个一般的FUSE问题。
另外,FUSE文档非常简略(实际上是缺失的文档)。如果熟悉这个问题的人能够解释一下ls
、cd
和cat
的调用流程,以及以哪个顺序调用FUSE函数,那将是很好的。
例如,为什么既有Open
又有Read
。
更新:
我一直在浏览SSHFS,它被认为是一个FUSE文件系统的典型示例,似乎它也在getattr
时通过网络获取文件:https://github.com/libfuse/sshfs/blob/master/sshfs.c#L3167
你认为呢?
英文:
I'm implementing a FUSE file system for a remote service.
When the user opens a file I do a network call to get the file's contents. It appears that the file's size must be reported through GetAttr
in order for open to work.
In order to know the file's size, I have to issue a network call, and since GetAttr
is called for every entry when doing ls
, I'm concerned about this design (if a user does ls
in a directory with many items, it will have to get all the files, even if the user didn't want to open any of them).
How can I work around this problem? My thoughts were:
- Use a lower level method for reading that doesn't rely on reported size? I thought using
Read
instead ofOpen
could help, however I couldn't get it to work without a size. - If I could distinguish
GetAttr
calls that originated fromOpen
from other calls (includingls
), I could issue the network calls only when needed.
I use Go and go-fuse, but I think it shouldn't matter because it's a general FUSE question.
Also, FUSE docs are very minimal (missing actually) documentation. It would be nice if someone familiar with the matter can explain the call flow for ls
, cd
and cat
- what FUSE functions are called in which order.
For example, why there is both Open
and Read
.
Update:
I've been browsing SSHFS which is considered the canonical example for a FUSE filesystem, and it seems that it also gets the file over network on getattr: https://github.com/libfuse/sshfs/blob/master/sshfs.c#L3167
What do you think?
答案1
得分: 1
你看到的问题是因为内核正在缓冲你的读取操作,当内核进行缓冲时,它使用inode的大小来计算需要复制到用户空间的字节数(https://elixir.bootlin.com/linux/v4.19.7/source/mm/filemap.c#L2137)。因此,有几种解决方法:
-
在GetAttr中返回巨大的st_size。
-
在打开文件时,设置direct_io标志,这样就不会使用页面缓存。
英文:
The problem you are seeing is because the the kernel is buffering your read, and when it does so, it uses the size of the inode to calculate exactly how many bytes it has to copy to userspace (https://elixir.bootlin.com/linux/v4.19.7/source/mm/filemap.c#L2137). So there are different workarounds:
-
Return huge st_size from GetAttr
-
When you open the file, set the direct_io flag so you don't use page caches.
答案2
得分: 0
我不了解go-fuse的API。下面的信息是基于libfuse的API。
SSHFS的GetAttr是在函数sshfs_getattr
中实现的,它看起来像是发送网络请求获取文件大小信息。
当你运行cd
命令时,它会运行.access
回调函数来检查目录是否存在。
当你运行ls
命令时,它首先调用.readdir
回调函数获取目录信息,然后调用.getattr
回调函数获取该目录中文件的信息。
当你运行cat
命令时,它首先调用.getattr
回调函数获取文件和路径的信息,然后调用.open
=> .read
=> .release
。
FUSE缺乏文档,你最好先编写一个示例,然后可以在这些回调函数中添加一些printf
语句来获取一些信息。
- 在
.open
回调函数中,你可以创建一个私有数据并将其设置为fuse_file_info::fh
。这个fuse_file_info::fh
可以在后续的.read
回调函数中使用。 - 你可以在
.getattr
回调函数中将所有大小信息设置为零。然后在.open
回调函数中将fuse_file_info::direct_io
设置为1。在.read
回调函数中,首先从网络中读取数据,如果到达文件末尾,则在.read
中返回0。
这个文档在我编写文件系统时帮助了我很多。
英文:
I don't know the go-fuse's API. Below info is based on libfuse's API.
The SSHFS's GetAttr is implement in function sshfs_getattr
, it looks like send network request get file size info.
When you run cd
, it will run .access
callback to check directory exists.
When you run ls
, it will first call .readdir
callback get dir info, then call .getattr
get info for files in that dir.
When you run cat
, it will first call .getattr
get info for file and info for path. Then call .open
=> .read
=> .release
.
FUSE's is lack of doc, you better first write an example, then you can add some printf
in those callbacks to get some info.
- In
.open
, you can create an privite data and set it tofuse_file_info::fh
. Thisfuse_file_info::fh
can be used in later.read
callbacks. - You can set all size info to zero in
.getattr
callbacks. Then in.open
, you setfuse_file_info::direct_io
to 1. In.read
, first read data from network, if you reach the end of file, thenreturn 0
in.read
.
This doc helps me a lot, when I wrote my filesystem.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论