2020年1月3日 15:19:53go评论65阅读模式

英文:

Scaling images before doing conversion or vice versa?

问题

我想知道以下两种方法中哪一种可以更好地保留图像的细节：

缩小 BGRA 图像，然后将其转换为 NV12/YV12 格式。
将 BGRA 图像转换为 NV12/YV12 格式，然后再缩小它们。

感谢您的建议。

英文:

I wonder which one among methods below should preserve more details of images:

Down scaling BGRA images and then converting them to NV12/YV12.
Converting BGRA images to NV12/YV12 images and then down scaling them.

Thanks for your recommendation.

Updated 2020-02-04:

For my question is more clear, I want to desribe a little more.

The images is come from a video stream like this:

Video Stream

-> decoded to YV12.
-> converted to BGRA.
-> stamped texts.
-> scaling down (or YV12/NV12).
-> YV12/NV12 (or scaling down).
-> H264 encoder.
-> video stream.

The whole sequence of tasks ranges from 300 to 500ms.
The issue I have is text stamped over the images after converted
and scaled looks not so clear. I wonder order at items: 4. then .5 or .5 then.4

答案1

得分: 1

请注意，RGB数据很可能是非线性的（例如，在sRGB格式中），理想情况下，您需要：

将非线性的“R'G'B'”数据转换为线性的RGB（请注意，这需要更高的每通道位精度）（参见维基百科上的函数规范）
应用您的降尺度滤镜
将线性结果转换回非线性的R'G'B'（即sRGB）
将其转换为YCbCr/NV12

理想情况下，您应该始终在线性空间中进行滤波/混合/着色。为了直观地为您解释这一点，在线性颜色空间中，黑色（0）和白色（255）的平均值将约为128，但在sRGB中，这个中灰色表示为（如果我没记错的话）186。因此，如果您在sRGB空间中进行数学运算，您的结果将看起来异常暗淡/浑浊。

（如果您着急，有时可以只使用平方（和sqrt()）作为将sRGB转换为线性（反之亦然）的一种权宜之计/技巧）

英文:

Noting that the RGB data is very likely to be non-linear (e.g. in an sRGB format) ideally you need to

Convert from the non-linear "R'G'B'" data to linear RGB (Note this needs higher bit precision per channel) (see function spec on wikipedia)
Apply your downscaling filter
Convert the linear result back to non-linear R'G'B' (ie. sRGB)
Convert this to YCbCr/NV12

Ideally you should always do filtering/blending/shading in linear space. To give you an intuitive justification for this, the average of black (0) and white (255) in linear colour space will be ~128 but in sRGB this mid grey is represented as (IIRC) 186. If you thus do your maths in sRGB space, your result will look unnaturally dark/murky.

(If you are in a hurry, you can sometimes get away with just using squaring (and sqrt()) as a kludge/hack to convert from sRGB to linear (and vice versa))

答案2

得分: 1

为了避免空间插值的两个阶段，建议按以下顺序进行操作：

将RGBA转换为YUV444（YCbCr），不进行调整大小。
调整Y通道至目标分辨率。
将U（Cb）和V（Cr）通道在每个轴上调整到一半的分辨率。
结果格式为输出图像的YUV420分辨率。
将数据打包为NV12（NV12是特定数据排序的YUV420格式）。
如果效率是关注点，可以在单次处理中进行调整和NV12打包。

如果不进行YUV444转换，U和V通道会被插值两次：

第一次插值在缩小RGBA时进行。
第二次插值在将U和V转换为420格式时按一半缩小时进行。

在缩小图像时，建议在缩小之前模糊图像（有时称为“抗锯齿”滤镜）。

备注：由于眼睛对色彩分辨率的敏感性较低，除非图像具有精细分辨率的图形，如彩色文本，否则您可能不会看到任何明显的差异。

备注：

Simon的答案在颜色准确性方面更为准确。
在大多数情况下，您不太可能看到差异。
转换为NV12时会丢失伽马信息。

更新：关于“转换和缩放后图像上的文字不够清晰”：

如果获得清晰的文本是主要问题，建议按以下步骤进行操作：

缩小BGRA。
印上文本（使用较小的字体）。
转换为NV12。

对带有印有文本的图像进行降采样将导致文本不清晰。

更好的解决方案是在缩小后使用较小的字体盖印文本。

现代字体使用矢量图形，而不是光栅图形，因此使用较小的字体盖印文本比缩小后带有盖印文本的图像效果更好。

NV12格式是YUV420，U和V通道在每个轴上缩小2倍，因此与RGB或YUV444格式相比，文本质量会较低。
在图像中编码文本也会损害文本。

对于字幕，解决方案是在单独的流中附加字幕，并在解码视频后添加文本。

英文:

For avoiding two phases of spatial interpolation the following order is recommended:

Convert RGBA to YUV444 (YCbCr) without resizing.
Resize Y channel to your destination resolution.
Resize U (Cb) and V (Cr) channels to half resolution in each axis.
The result format is YUV420 in the resolution of the output image.
Pack the data as NV12 (NV12 is YUV420 in specific data ordering).
It is possible to do the resize and NV12 packing in a single pass (if efficiency is a concern).

In case you don't do the conversion to YUV444, U and V channels are going to be interpolated twice:

First interpolation when downscaling RGBA.
Second interpolation when U and V are downscaled by half when converting to 420 format.

When downscaling the image it's recommended to blur the image before downscaling (sometimes referred as "anti-aliasing" filter).

Remark: since the eye is less sensitive to chromatic resolution, you are probably not going to see any visible difference (unless image has fine resolution graphics like colored text).

Remarks:

Simon answer is more accurate in terms of color accuracy.
In most cases you are not going to see the difference.
The gamma information is lost when converting to NV12.

Update: Regarding "Text stamped over the images after converted and scaled looks not so clear":

In case getting clear text is the main issue, the following stages are suggested:

Downscale BGRA.
Stamp text (using smaller font).
Convert to NV12.

Downsampling an image with stamped text, is going to result unclear text.

A better solution is to stamp a test with smaller font, after downscaling.

Modern fonts uses vectored graphics, and not raster graphics, so stamping text with smaller font gives better result than downscaled image with stamped text.

NV12 format is YUV420, the U and V channels are downscaled by a factor of x2 in each axis, so the text quality will be lower compared to RGB or YUV444 format.
Encoding image with text is also going to damage the text.

For subtitles the solution is attaching the subtitles in a separate stream, and adding the text after decoding the video.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在执行转换之前缩放图像还是相反？

问题

答案1

答案2

有没有更有效的射线投射方法？

如何从着色器中访问大于32位的地址或索引？

在对象之间传递数据：通过对象的属性。

如何规范化图像

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论