英文:
Why do I need to use `flush=True` if printing when stdout isn't a TTY in Python but not in Java or Go?
问题
给定以下小程序:
#!/usr/bin/env python3
import time
for i in range(15):
print(f'{i}: sleeping')
time.sleep(1)
当我直接将stdout
连接到终端并运行它时,输出几乎立即显示:
./sync_test.py
然而,如果我将程序的stdout
连接到一个管道(另一端是cat
监听并打印到终端),直到程序结束我才能看到输出:
./sync_test.py | cat
唯一能让输出在程序退出之前打印出来的方法是在所有的打印语句中添加flush=True
。
我在其他几种语言(Go、Java和C)中尝试了这个问题,结果各不相同。在Go和Java中,当stdout
未连接到TTY时,程序会立即打印输出。但是在C中,我遇到了与Python相同的行为。
Go程序(test_go.go):
package main
import (
"fmt"
"time"
)
func main() {
for i := 0; i < 15; i++ {
fmt.Printf("%d: sleeping\n", i)
time.Sleep(1 * time.Second)
}
}
Java程序(test_java.java):
public class test_java {
public static void main(String[] args) throws Exception{
for(int i=0; i<15; i++){
System.out.println(i + ": sleeping");
Thread.sleep(1000);
}
}
}
C程序(test_c.c):
#include <stdio.h>
#include <unistd.h>
int main() {
for(int i=0; i < 15; i ++) {
printf("Hello, World!\n");
sleep(1);
}
return 0;
}
(所有程序都在命令末尾加上了 | cat
)
我了解到stdout
是有缓冲的。对于不同语言之间行为差异的解释是什么?无论stdout
是否连接到TTY,Java和Go是否隐式地刷新缓冲区,而C和Python则没有?
英文:
Given the following small program:
#!/usr/bin/env python3
import time
for i in range(15):
print(f'{i}: sleeping')
time.sleep(1)
When I run it with stdout
attached to the terminal directly, I get the output pretty much immediately:
./sync_test.py
However, if I run my program with stdout
attached to a pipe (which on the other end cat
is listening and printing to the terminal), I don't get the output until the program ends:
./sync_test.py | cat
The only way I can get the output to be printed before the program quits is if I add flush=True
to all of my print statements.
I tried this out in a few other languages (Go, Java, and C) and got mixed results. In Go and Java, the programs print immediately when stdout
is not attached to a TTY. But with C, I experience the same behavior as I do with Python.
The Go program (test_go.go):
package main
import (
"fmt"
"time"
)
func main() {
for i := 0; i < 15; i++ {
fmt.Printf("%d: sleeping\n", i)
time.Sleep(1 * time.Second)
}
}
The Java program (test_java.java):
public class test_java {
public static void main(String[] args) throws Exception{
for(int i=0; i<15; i++){
System.out.println(i + ": sleeping");
Thread.sleep(1000);
}
}
}
The C program (test_c.c):
#include <stdio.h>
#include <unistd.h>
int main() {
for(int i=0; i < 15; i ++) {
printf("Hello, World!\n");
sleep(1);
}
return 0;
}
(all programs were run with | cat
at the end of the command)
I understand that stdout
is buffered. What is the explanation for the differences in behavior across languages? And whether stdout
is connected to a TTY or not? Are Java and Go flushing the buffers implicitly whereas C and Python are not?
答案1
得分: 1
“为什么Python会这样做”的答案是“为了与C标准大致保持一致”。默认情况下,与tty连接的情况下,stdout
是行缓冲的,否则是块缓冲的。在Python 2时代,sys.stdout
(以及由常规open
返回的文件对象,但不包括io.open
返回的对象)实际上是基于C的stdio.h
调用构建的,因此这种行为直接继承自C;在Python 3中,stdio.h
从图像中删除了(Python 3使用自己的缓冲包装器进行I/O,刷新时直接调用底层的操作系统I/O例程,完全不涉及C风格的FILE*
),但为了最小化现有代码的兼容性问题,他们保留了这种行为。
C(以及其他一些语言)在连接到tty时将其设置为行缓冲,是为了让用户立即看到完整的行。否则,它是块缓冲的,因为假设没有人会实时读取行,所以通过在每次批量写入系统调用之前缓冲更多数据来减少系统调用是有益的。
现在的系统调用并不是那么昂贵,所以现代语言默认情况下即使在没有连接到tty的情况下也采用行缓冲是完全合理的;它可能运行得更慢,但在整体方案中,几千个周期并不算太多。Go和Java的行为并没有“错”,Python和C的行为也没有“错”。这种行为是一致的,并且对于常见情况而言效果良好,而且可以很容易地添加刷新操作(或以其他方式干扰默认缓冲,例如使用Python的-u
命令行开关或C中的setvbuf
函数),而且更改默认行为可能会改变现有代码的行为,因此没有强烈的动机去回头修改默认行为。
英文:
The answer to "Why does Python do this?" is "To be roughly consistent with the C standard", where, by default, being connected to a tty makes stdout
line buffered, and block buffered otherwise. Back in the Python 2 days, sys.stdout
(and the file objects returned by regular open
, but not by io.open
) was actually built on top of C's stdio.h
calls, so this behavior was inherited from C directly; in Python 3, stdio.h
was removed from the picture (Python 3 uses its own buffering wrappers for I/O, which when flushed, call the underlying OS I/O routines directly, with no C-style FILE*
s involved at all), but they preserved the behavior to minimize compatibility issues for existing code.
The reason C (and some other languages) make it line-buffered when connected to a tty is to allow the user to see complete lines immediately. Otherwise it's block-buffered because it's assumed no one is reading the lines as they appear anyway, so reducing system calls (by buffering more before each bulk write system call) is beneficial.
System calls aren't that expensive nowadays, so it's perfectly reasonable for a modern language to default to line buffering even when not connected to a tty; it'll run slower, but a few thousand cycles here and there isn't that much in the grand scheme of things. Go and Java aren't wrong to behave the way they do, and Python and C aren't wrong to go the other way. The behavior is consistent, and it works well enough for common cases, and it's trivial to add flushes (or otherwise mess with the default buffering, e.g. with the -u
command line switch for Python, or the setvbuf
function in C), and changing it might change the behavior of existing code, so there's no strong incentive to go back and alter the default behavior.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论