在POSIX中的`write`序列化

huangapple go评论124阅读模式
英文:

`write` serialization in POSIX

问题

  1. "occur" in this context refers to the actual execution of the read() function, meaning when the read operation is initiated and data is read from the file.

  2. According to the POSIX-2017 specification, if another process calls write twice on the same file while one process is calling read, the read should reflect all of the writes in the order they were made. In your diagram, both write1 and write2 should be reflected in the read, and the data should be read in the order they were written:

|----------------read-----------------|
  |--write1--|       |--write2--|
  1. The specific implementation details, like how this is handled by file systems such as ext4, can vary. However, POSIX compliance typically requires that file systems ensure proper synchronization and ordering of reads and writes to maintain data consistency. Whether it's worth worrying about depends on your specific use case and requirements. If you're dealing with multi-process or multi-threaded applications that rely on data consistency, it's important to be aware of these synchronization guarantees and, if necessary, implement additional synchronization mechanisms in your code.
英文:

I'm trying to understand this requirement in POSIX-2017:

> Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position.

  1. Does "occur" refer to read being called, read returning successfully, or something else?

  2. If, while one process is calling read, another process calls write twice on the same file, are there any circumstances where the read will reflect some or all of the second write, but not all of the first?

  |----------------read-----------------|
      |--write1--|       |--write2--|
  1. How is this handled by implementations (e.g. ext4)? Is this something worth worrying about?

答案1

得分: 0

回答你的第一个问题

"Occur" 指的是整个读取过程,从调用的点到返回值的点。所有这些都必须发生在前一个写入之后,并在下一个写入之前。同一页面也是如此:

> 在对常规文件的 write() 成功返回之后:
>
> * 对文件中由该 write() 修改的每个字节位置的任何成功 read() 都应返回该位置的 write() 指定的数据,直到再次修改这些字节位置为止。
>
> * 对文件中相同字节位置的任何后续成功 write() 都应覆盖该文件数据。

POSIX 对于任何交错的保证都没有,因为实现额外的保证相当困难。

关于第二个问题

再次参考上述引用。如果进程调用了 write() 并且 write() 成功返回,那么任何后续的读取都会反映出写入的数据。

所以答案是 "是的,如果第一个 write() 失败"。

实现

ext4 和几乎所有其他文件系统都使用页面缓存。页面缓存是文件数据(或其中的相关部分)的内存表示。需要进行的任何同步都是使用此表示完成的。在这方面,从文件中读取和写入就像从共享内存中读取和写入一样。

页面缓存如其名称所示,是由页面构建的。在大多数实现中,一个页面是 4k 内存的一部分,读取和写入都以页面为单位进行。

这意味着例如 ext4 将在文件的同一 4k 区域上串行读取和写入,但是 12k 的写入可能不是原子操作。

据我所知,ext4 不允许在同一页面上进行并发的多次写入,也不允许在同一页面上进行并发的读取和写入,但没有任何地方保证这一点

编辑:文件系统(磁盘上的)块大小可能小于页面,如果是这样,可能会以块大小的粒度进行一些 I/O,但从原子性的角度来看,这甚至更不可靠。

英文:

To answer your first question:

"Occur" refers to the whole read, from the point of the call to the point of the value being returned. All of it has to happen after the previous write, and before the next write. The same page says so:

> After a write() to a regular file has successfully returned:
>
> * Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.
>
> * Any subsequent successful write() to the same byte position in the file shall overwrite that file data.

POSIX makes no guarantee whatsoever on any sort of interleaving, because implementing additional guarantees is quite difficult.

Regarding the second question:

Again, refer to the above quote. If a process called write() and write() returned successfully, any subsequent read by any processes would reflect the written data.

So the answer is "yes, if the first write() failed".

Implementation:

ext4, and almost every other filesystem, uses a page cache. The page cache is an in-memory representation of the file's data (or a relevant part thereof). Any synchronization that needs to be done, is done using this representation. In that respect, reading and writing from the file is like reading and writing from shared memory.

The page cache, as the name suggests, is built with pages. In most implementations, a page is a region of 4k of memory, and reads and writes happen on a page basis.

This means that e.g. ext4 will serialize reads & writes on the same 4k region of the file, but a 12k write may not be atomic.

AFAICT, ext4 does not allow concurrent multiple writes on the same page, or concurrent reads & writes on the same page, but it is nowhere guaranteed.

edit: The filesystem (on-disk) block size might be smaller than a page, in which case some I/O may be done at a block-size granularity, but that is even less reliable in terms of atomicity.

huangapple
  • 本文由 发表于 2020年1月6日 19:01:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610904.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定