FUSE – 避免在 GetAttr 中计算大小

huangapple go评论77阅读模式
英文:

FUSE - avoid calculating size in GetAttr

问题

我正在为一个远程服务实现一个FUSE文件系统。
当用户打开一个文件时,我会进行网络调用以获取文件的内容。根据这里的描述,为了使打开操作正常工作,必须通过GetAttr报告文件的大小。
为了知道文件的大小,我必须发起一个网络调用,而且由于在执行ls时会为每个条目调用GetAttr,我对这个设计有所担忧(如果用户在一个包含许多项目的目录中执行ls,它将不得不获取所有文件,即使用户并不想打开其中任何一个)。

我该如何解决这个问题?我的想法是:

  • 使用一个不依赖于报告大小的较低级别的读取方法?我认为使用Read而不是Open可能有所帮助,但是我无法在没有大小的情况下使其正常工作。
  • 如果我能区分GetAttr调用是由Open发起的还是其他调用(包括ls),我可以只在需要时发起网络调用。

我使用Go和go-fuse,但我认为这并不重要,因为这是一个一般的FUSE问题。

另外,FUSE文档非常简略(实际上是缺失的文档)。如果熟悉这个问题的人能够解释一下lscdcat的调用流程,以及以哪个顺序调用FUSE函数,那将是很好的。
例如,为什么既有Open又有Read

更新:
我一直在浏览SSHFS,它被认为是一个FUSE文件系统的典型示例,似乎它也在getattr时通过网络获取文件:https://github.com/libfuse/sshfs/blob/master/sshfs.c#L3167
你认为呢?

英文:

I'm implementing a FUSE file system for a remote service.
When the user opens a file I do a network call to get the file's contents. It appears that the file's size must be reported through GetAttr in order for open to work.
In order to know the file's size, I have to issue a network call, and since GetAttr is called for every entry when doing ls, I'm concerned about this design (if a user does ls in a directory with many items, it will have to get all the files, even if the user didn't want to open any of them).

How can I work around this problem? My thoughts were:

  • Use a lower level method for reading that doesn't rely on reported size? I thought using Read instead of Open could help, however I couldn't get it to work without a size.
  • If I could distinguish GetAttr calls that originated from Open from other calls (including ls), I could issue the network calls only when needed.

I use Go and go-fuse, but I think it shouldn't matter because it's a general FUSE question.

Also, FUSE docs are very minimal (missing actually) documentation. It would be nice if someone familiar with the matter can explain the call flow for ls, cd and cat - what FUSE functions are called in which order.
For example, why there is both Open and Read.

Update:
I've been browsing SSHFS which is considered the canonical example for a FUSE filesystem, and it seems that it also gets the file over network on getattr: https://github.com/libfuse/sshfs/blob/master/sshfs.c#L3167
What do you think?

答案1

得分: 1

你看到的问题是因为内核正在缓冲你的读取操作,当内核进行缓冲时,它使用inode的大小来计算需要复制到用户空间的字节数(https://elixir.bootlin.com/linux/v4.19.7/source/mm/filemap.c#L2137)。因此,有几种解决方法:

  1. 在GetAttr中返回巨大的st_size。

  2. 在打开文件时,设置direct_io标志,这样就不会使用页面缓存。

英文:

The problem you are seeing is because the the kernel is buffering your read, and when it does so, it uses the size of the inode to calculate exactly how many bytes it has to copy to userspace (https://elixir.bootlin.com/linux/v4.19.7/source/mm/filemap.c#L2137). So there are different workarounds:

  1. Return huge st_size from GetAttr

  2. When you open the file, set the direct_io flag so you don't use page caches.

答案2

得分: 0

我不了解go-fuse的API。下面的信息是基于libfuse的API。

SSHFS的GetAttr是在函数sshfs_getattr中实现的,它看起来像是发送网络请求获取文件大小信息。

当你运行cd命令时,它会运行.access回调函数来检查目录是否存在。

当你运行ls命令时,它首先调用.readdir回调函数获取目录信息,然后调用.getattr回调函数获取该目录中文件的信息。

当你运行cat命令时,它首先调用.getattr回调函数获取文件和路径的信息,然后调用.open => .read => .release

FUSE缺乏文档,你最好先编写一个示例,然后可以在这些回调函数中添加一些printf语句来获取一些信息。

  1. .open回调函数中,你可以创建一个私有数据并将其设置为fuse_file_info::fh。这个fuse_file_info::fh可以在后续的.read回调函数中使用。
  2. 你可以在.getattr回调函数中将所有大小信息设置为零。然后在.open回调函数中将fuse_file_info::direct_io设置为1。在.read回调函数中,首先从网络中读取数据,如果到达文件末尾,则在.read中返回0。

这个文档在我编写文件系统时帮助了我很多。

英文:

I don't know the go-fuse's API. Below info is based on libfuse's API.

The SSHFS's GetAttr is implement in function sshfs_getattr, it looks like send network request get file size info.

When you run cd, it will run .access callback to check directory exists.

When you run ls, it will first call .readdir callback get dir info, then call .getattr get info for files in that dir.

When you run cat, it will first call .getattr get info for file and info for path. Then call .open => .read => .release.

FUSE's is lack of doc, you better first write an example, then you can add some printf in those callbacks to get some info.

  1. In .open, you can create an privite data and set it to fuse_file_info::fh. This fuse_file_info::fh can be used in later .read callbacks.
  2. You can set all size info to zero in .getattr callbacks. Then in .open, you set fuse_file_info::direct_io to 1. In .read, first read data from network, if you reach the end of file, then return 0 in .read.

This doc helps me a lot, when I wrote my filesystem.

huangapple
  • 本文由 发表于 2017年9月18日 03:20:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/46267972.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定