如何处理包含进度条的日志输出?

huangapple go评论81阅读模式
英文:

How to deal with log output which contains progress bar?

问题

上下文

这个问题与特定的编程语言无关,而是关于当我们将输出写入终端和写入文件时,stdout的工作原理。无论如何,为了演示,我需要选择一种语言,我选择Python来进行问题部分的演示。

我从这个答案中偷了下面的代码:

将以下代码保存为progress.py:

def progressBar(iterable, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
    total = len(iterable)
    # Progress Bar Printing Function
    def printProgressBar (iteration):
        percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
        filledLength = int(length * iteration // total)
        bar = fill * filledLength + '-' * (length - filledLength)
        print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = printEnd)
    # Initial Call
    printProgressBar(0)
    # Update Progress Bar
    for i, item in enumerate(iterable):
        yield item
        printProgressBar(i + 1)
    # Print New Line on Complete
    print()


import time

# A List of Items
items = list(range(0, 57))

# A Nicer, Single-Call Usage
for item in progressBar(items, prefix = 'Progress:', suffix = 'Complete', length = 50):
    # Do stuff...
    time.sleep(0.1)

当你使用python3 progress.py运行这个程序时,你会看到一个从左到右的进度条。

我附上了一张截图:

如何处理包含进度条的日志输出?

如果你自己跟着做,你会注意到进度在同一行上更新,即每一步都不会换行。

现在尝试将脚本的输出路由到文件。执行python3 progress.py > stdout.log

如果你在这一点上查看stdout.log,终端会正确解释它并显示最后的输出,即100.0%的完整输出。

现在如果打开输出文件,你会发现有些不同。这里我粘贴了内容:

^MProgress: |--------------------------------------------------| 0.0% Complete^MProgress: |--------------------------------------------------| 1.8% Complete^MProgress: |█-------------------------------------------------| 3.5% Complete^MProgress: |██------------------------------------------------| 5.3% Complete^MProgress: |███-----------------------------------------------| 7.0% Complete^MProgress: |████----------------------------------------------| 8.8% Complete^MProgress: |█████---------------------------------------------| 10.5% Complete^MProgress: |██████--------------------------------------------| 12.3% Complete^MProgress: |███████-------------------------------------------| 14.0% Complete^MProgress: |███████-------------------------------------------| 15.8% Complete^MProgress: |████████------------------------------------------| 17.5% Complete^MProgress: |█████████-----------------------------------------| 19.3% Complete^MProgress: |██████████----------------------------------------| 21.1% Complete^MProgress: |███████████---------------------------------------| 22.8% Complete^MProgress: |████████████--------------------------------------| 24.6% Complete^MProgress: |█████████████-------------------------------------| 26.3% Complete^MProgress: |██████████████------------------------------------| 28.1% Complete^MProgress: |██████████████------------------------------------| 29.8% Complete^MProgress: |███████████████-----------------------------------| 31.6% Complete^MProgress: |████████████████----------------------------------| 33.3% Complete^MP

实际问题

我正在使用Docker SDK和Go编程语言从容器中获取输出并将其发布到Gist。docker logs的输出中包含一个进度条。

这是我发布到Gist的一个日志的链接:https://gist.github.com/avimanyu786/040243ee1f9a260677080a69ffb88d59

我知道在终端上,终端会解释控制字符并重新写入该行。当我们将其写入文件时,它会显示整个内容,就像我们在Gist中看到的那样。

我的理论解决方案

如果我们以我们的Gist输出为例,我们会发现第10行实际上占用了多于一行的可视行(逻辑上仍然是1行)。

我们还知道,每个可视行都以一个控制字符结尾,在Gist上呈现为方块。

在将输出发送到Gist之前,我想要:

  1. 将日志作为bytes.Buffer接收(如果需要,我可以将其转换为bytes或字符串)。
  2. 遍历所有行。
  3. 如果行中有任何控制字符,则从该逻辑行的开头删除该逻辑行上的最后一个控制字符。

这样做将只显示该行的最后更新。

我不确定如何做到这一点。正则表达式在这里管用吗?我以前没有处理过控制字符。我如何删除从行的开头到最后一个控制字符的内容?

英文:

Context

This question is not related to any particular programming language, but how stdout works when we write to a terminal vs when we write to a file. Anyways, to demonstrate, I'll have to pick a language, and I choose Python for the problem part.

I've stolen code below from this answer:

Save this code as progress.py:

def progressBar(iterable, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
    total = len(iterable)
    # Progress Bar Printing Function
    def printProgressBar (iteration):
        percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
        filledLength = int(length * iteration // total)
        bar = fill * filledLength + '-' * (length - filledLength)
        print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = printEnd)
    # Initial Call
    printProgressBar(0)
    # Update Progress Bar
    for i, item in enumerate(iterable):
        yield item
        printProgressBar(i + 1)
    # Print New Line on Complete
    print()


import time

# A List of Items
items = list(range(0, 57))

# A Nicer, Single-Call Usage
for item in progressBar(items, prefix = 'Progress:', suffix = 'Complete', length = 50):
    # Do stuff...
    time.sleep(0.1)

When you run this program using python3 progress.py, you'd see a progress bar going from left to right.

I have attached a screenshot:

如何处理包含进度条的日志输出?

If you are following this on your own, you'd notice that the progress updates on the same line, i.e. it does not writes a new line for each step.

Now try to route the output of the script to file. Do a python3 progress.py > stdout.log.

If you cat the stdout.log at this point, terminal will interpret it correctly and show the last output, which is 100.0% complete output.

Now if open the output file, you'll find something else. Here I've pasted the content:

^MProgress: |--------------------------------------------------| 0.0% Complete^MProgress: |--------------------------------------------------| 1.8% Complete^MProgress: |█-------------------------------------------------| 3.5% Complete^MProgress: |██------------------------------------------------| 5.3% Complete^MProgress: |███-----------------------------------------------| 7.0% Complete^MProgress: |████----------------------------------------------| 8.8% Complete^MProgress: |█████---------------------------------------------| 10.5% Complete^MProgress: |██████--------------------------------------------| 12.3% Complete^MProgress: |███████-------------------------------------------| 14.0% Complete^MProgress: |███████-------------------------------------------| 15.8% Complete^MProgress: |████████------------------------------------------| 17.5% Complete^MProgress: |█████████-----------------------------------------| 19.3% Complete^MProgress: |██████████----------------------------------------| 21.1% Complete^MProgress: |███████████---------------------------------------| 22.8% Complete^MProgress: |████████████--------------------------------------| 24.6% Complete^MProgress: |█████████████-------------------------------------| 26.3% Complete^MProgress: |██████████████------------------------------------| 28.1% Complete^MProgress: |██████████████------------------------------------| 29.8% Complete^MProgress: |███████████████-----------------------------------| 31.6% Complete^MProgress: |████████████████----------------------------------| 33.3% Complete^MP

The actual problem

I am using Docker SDK and Go programming language to get output from a container and post it to Gist. The output of the docker logs involves one such progress bar.

Here is link to one such log I post to Gist: https://gist.github.com/avimanyu786/040243ee1f9a260677080a69ffb88d59

I understand that on terminal, the terminal interprets the control character and re-writes the line. When we write it to a file, it shows the whole thing, as we seen in the gist.

My solution in theory

If we take example of our gist output, we see the line 10 actually takes more than one visual line (logically still 1 line).

We also know that each of those visual lines ends with a control character, which on Gist is rendered as square blocks.

Before sending the output to the gist, I want to:

  1. Receive logs as bytes.Buffer (I can convert it to bytes or string if needed).
  2. Iterate over all the lines.
  3. If there is any control character in the line, delete from start of the line the last control character on that logical line.

What this is going to do is to only show the last update of that line.

I'm not sure how to do this. Will regex work here? I have not dealt with control characters before. How do I delete from start of the line to last control character?

答案1

得分: 0

我找到了对我的问题的答案。

所以,Docker将日志输出放在io.ReadCloser中,该输出可以写入bytes.Buffer:

var stdout bytes.Buffer
var stderr bytes.Buffer

containerLog := GetLogs(containerID)
stdcopy.StdCopy(&stdout, &stderr, containerLog)

这是GetLogs的代码:

// GetLogs从容器的io.ReadCloser中返回日志。调用者的职责是执行stdcopy.StdCopy。任何其他方法可能会将未知的Unicode字符渲染为日志输出,因为日志输出既包含stdout又包含stderr。以info开头的内容表示该行是stderr还是stdout。
func GetLogs(contName string) (logOutput io.ReadCloser) {
	options := types.ContainerLogsOptions{ShowStdout: true, ShowStderr: true}

	out, err := dc.ContainerLogs(ctx, contName, options)
	if err != nil {
		panic(err)
	}

	return out
}

在将其发送到接受字符串的GitHub API之前,我们可以删除每行中最后一个\r之前的所有内容:

// cleanFlushInfo从docker日志输出的bytes.Buffer中获取,并对每行进行处理
// 如果行中有\r,则取最后一个并组成另一个字符串。
func cleanFlushInfo(bytesBuffer *bytes.Buffer) string {
	scanner := bufio.NewScanner(bytesBuffer)
	finalString := ""

	for scanner.Scan() {
		line := scanner.Text()
		chunks := strings.Split(line, "\r")
		lastChunk := chunks[len(chunks)-1] // 获取行的最后一个更新
		finalString += lastChunk + "\n"
	}

	return finalString
}

**为什么不高效?**随着时间的推移,日志会变得越来越长。程序需要做的工作来删除不需要的信息也会增加。

为了解决这个问题,可以只获取容器的N分钟输出。可以将它们列在Gist上,可以通过列出基于时间块的多个文件或覆盖文件来实现(Gist仍将保留旧信息)。

英文:

I found an answer to my question.

So docker spits the log output in io.ReadCloser, that output can be written to a bytes.Buffer:

var stdout bytes.Buffer
var stderr bytes.Buffer

containerLog := GetLogs(containerID)
stdcopy.StdCopy(&stdout, &stderr, containerLog)

Here is code for GetLogs anyway:

// GetLogs return logs from the container io.ReadCloser. It's the caller duty
// duty to do a stdcopy.StdCopy. Any other method might render unknown
// unicode character as log output has both stdout and stderr. That starting
// has info if that line is stderr or stdout.
func GetLogs(contName string) (logOutput io.ReadCloser) {
	options := types.ContainerLogsOptions{ShowStdout: true, ShowStderr: true}

	out, err := dc.ContainerLogs(ctx, contName, options)
	if err != nil {
		panic(err)
	}

	return out
}

Before sending it to GitHub's API which accepts string, we can get rid of all the content before last \r in a line:

// cleanFlushInfo takes in bytes.Buffer from docker logs output and for each line
// if it has a \r in the lines, takes the last one and compose another string
// out of that.
func cleanFlushInfo(bytesBuffer *bytes.Buffer) string {
	scanner := bufio.NewScanner(bytesBuffer)
	finalString := ""

	for scanner.Scan() {
		line := scanner.Text()
		chunks := strings.Split(line, "\r")
		lastChunk := chunks[len(chunks)-1] // fetch the last update of the line
		finalString += lastChunk + "\n"
	}

	return finalString
}

Why is it not efficient? As the time passes, the logs will grow longer. The work that program has to do to remove unwanted information will grow as well.

What can be done to overcome problem is the only fetch N minutes of output from the container. Have them listed on Gist, either by listing many time-chunk based files, or overwriting the files (gist will still retain older information).

huangapple
  • 本文由 发表于 2022年11月9日 21:21:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/74375547.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定