`seekg()` 和 `seekp()` 是操作字符还是字节?

huangapple go评论61阅读模式
英文:

Do `seekg()` and `seekp()` operate on characters or bytes?

问题

The code snippet you provided from the book seems to use the term "character" when referring to positioning within a file, but it's important to note that in C++, file positioning can indeed be expressed in terms of both bytes and characters, depending on the context and the type of stream being used.

In C++, the seekp() function is typically used with binary file streams (std::ofstream), and it indeed deals with byte positions. So, when you use fs.seekp(1);, it moves the writing position to the second byte in the file. This behavior aligns with your observation in your example.

On the other hand, when you're dealing with text file streams (std::ofstream), the positioning may be expressed in terms of characters, especially if the file uses multibyte character encodings. So, if you're working with wide characters or a text file with multibyte encoding, seekp() might move to a character position within the file.

The use of seekp() can be a bit context-dependent, and the term "character" or "byte" should be understood based on the type of stream and encoding being used.

So, to answer your question: The code snippet in the book says "character" because it's a simplified explanation, and in many cases with text files or wide characters, seekp() can indeed work with character positions. However, when dealing with binary files or specific encodings, it operates on byte positions. The pos_type you mentioned in the function signature represents the position type used by the stream, which can vary based on the stream type and its characteristics.

英文:

Page 393 of 'Programming: Principles and Practice' introduces seekg() and seekp() as follows:

> However, if you must, you can use positioning to select a specific place in a file for reading or writing. Basically, every file that is open for reading has a "read/get position" and every file that is open for writing has a "write/put position":
>
> [diagram]
>
> ```
>
> fstream fs {name}; // open for input and output
> if (!fs) error("can't open ", name);
>
> fs.seek(5); // move reading position to the 5 (the 6th character)
> char ch;
> fs >> ch; // read and increment reading position
> cout << "character[5] is " << ch << ' {' << int(ch) << "}\n";
>
> fs.seekp(1); // move writing position to 1
> fs << 'y'; // write and increment writing position
>

In the code snippet, "position" is expressed in terms of characters, e.g. position 5 is referred as the "6th character". This confused me because up until this point, we've thought of a file as a sequence of bytes, so I would have expected position to be expressed in terms of bytes (in the example above, I thought 5 was the position of the 6th byte of the file).

So, I tried to test it out by writing to position 1 of a file containing a single wide character:

wide.txt

test.cpp

#include &quot;../std_lib_facilities.h&quot;

int main() {

    fstream fs {&quot;wide.txt&quot;};

    fs.seekp(1);
    fs &lt;&lt; &#39;y&#39;;
    fs.close();

    return 0;

}

After running this code, wide.txt looks like this:

�y�

It seems that the charater 'y' was written to the 2nd byte of the program, not to the 2nd character, which would imply that position refers to a byte, not a character. So, why does the code snippet in the book say "character"?

I also noticed that the function signature is basic_ostream&amp; seekp( pos_type pos ); (see CPP Reference), but I can't find an explanation of whether pos_type refers to a character or a byte.

The reference on cplusplus.com also seems to define position in terms of characters (emphasis added):

> Sets the position where the next character is to be inserted into the output stream.

As does the following comment on a Reddit thread (emphasis added):

> A streampos IS NOT an integer, it's not some byte position in a stream. It represents a character position in the stream, and the type holds some stream state information for the purpose of code conversion and character position.

But this seems to contradict what I see in my example, where fs.seekp(1) seems to overwrite byte 1 (the 2nd byte).

答案1

得分: 3

seekg()seekp() 操作的是代码单元,在C++中通常称为“字符”,尽管后者这个术语可能有其他含义。它们绝对不会操作字节,如果考虑到对于宽字符流,缓冲区元素的类型为 wchar_t,这一点就很容易理解。

引用自 https://en.cppreference.com/w/cpp/io/basic_streambuf

受控字符序列(缓冲区)是一个 CharT 类型的数组,始终表示相关字符序列的子序列或“窗口”。

英文:

seekg() and seekp operate on code units which are often referred to as "characters" in C++ although the latter term is pretty overloaded and may mean other things. They definitely don't operate on bytes which is easy to see if you consider that for wide streams the buffer elements have type wchar_t.

Quoting https://en.cppreference.com/w/cpp/io/basic_streambuf:

> The controlled character sequence (buffer) is an array of CharT which, at all times, represents a subsequence, or a "window" into the associated character sequence.

答案2

得分: 0

seekg()seekp()seek* 函数的上下文中操作的是字节而不是字符。在编码方面,像UTF8这样的编码,移动一个 seek* 字符需要进一步的澄清。seek* 字符不考虑码点,因此移动一个 seek* 字符可能会将读/写指针定位在一个4字节长的Unicode字符内部,从那里读取或写入可能会导致无效的码点或一些意外的图形。

英文:

> Do seekg() and seekp() operate on characters or bytes?

In the context of the seek* functions, they are the same thing - and it has only a very loose relation to visible (or invisible) characters in graphemes, grapheme-like units or symbols.

For encodings, like UTF8, stepping "a character" will need clarification. A seek* character does not take code points into consideration, so stepping one seek*-character may position the r/w pointer somewhere inside a 4 octet long unicode character and reading or writing from/to there may result in an invalid codepoint - or some unexpected grapheme.

答案3

得分: 0

Sure, here are the translated parts:

Do seekg() and seekp() operate on characters or bytes?

seekg()seekp() 是操作字符还是字节的?

Both?

两者都可以。

why does the code snippet in the book say "character"?

为什么书中的代码片段说“字符”?

char 是一个字符。它表示一个字节。在这个上下文中,它们是相同的。

There is also wfstream. It operates on wide characters. These characters take multiple bytes encoded in one wchar_t type.

还有 wfstream。它操作的是 宽字符。这些字符使用一个 wchar_t 类型编码的多个字节。

whether pos_type refers to a character or a byte.

pos_type 是指字符还是字节,这都是一种抽象。它取决于流引用的内容。在 fstream 的情况下,一个字符是一个 char,即一个字节。在 wfstream 的情况下,一个字符有 sizeof(wchar_t) 个字节。在我想象中的 typedef basic_ios<__uint128_t> my_super_stream_with_16_bytes_characters; 中,一个字符有16个字节。

contradict what I see in my example, where fs.seekp(1) seems to overwrite byte 1

与我在示例中看到的情况相矛盾,其中 fs.seekp(1) 似乎覆盖了字节1。

不,只是在这种情况下,fs 指的是一个流,在其中一个字符代表一个字节(根据你的操作系统和实现)。

英文:

> Do seekg() and seekp() operate on characters or bytes?

Both?

> why does the code snippet in the book say "character"?

char is a character. It represents one byte. In this context, they are the same.

There is also wfstream. It operates on wide characters. These characters take multiple bytes encoded in one wchar_t type.

> whether pos_type refers to a character or a byte.

This all is an abstraction. It depends on what the stream refers to. In case of fstream a character is one char which is one byte. In case of wfstream a character has sizeof(wchar_t) bytes. In case of my imaginary typedef basic_ios&lt;__uint128_t&gt; my_super_stream_with_16_bytes_characters; a character has 16 bytes.

> contradict what I see in my example, where fs.seekp(1) seems to overwrite byte 1

No, just fs in this case refers to a stream where one character represents one byte (on your operating system on your implementation).

huangapple
  • 本文由 发表于 2023年5月13日 12:58:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76241138.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定