在执行转换之前缩放图像还是相反?

huangapple go评论57阅读模式
英文:

Scaling images before doing conversion or vice versa?

问题

我想知道以下两种方法中哪一种可以更好地保留图像的细节:

  1. 缩小 BGRA 图像,然后将其转换为 NV12/YV12 格式。
  2. 将 BGRA 图像转换为 NV12/YV12 格式,然后再缩小它们。

感谢您的建议。

英文:

I wonder which one among methods below should preserve more details of images:

  1. Down scaling BGRA images and then converting them to NV12/YV12.
  2. Converting BGRA images to NV12/YV12 images and then down scaling them.

Thanks for your recommendation.

Updated 2020-02-04:

For my question is more clear, I want to desribe a little more.

The images is come from a video stream like this:

Video Stream

  1. -> decoded to YV12.

  2. -> converted to BGRA.

  3. -> stamped texts.

  4. -> scaling down (or YV12/NV12).

  5. -> YV12/NV12 (or scaling down).

  6. -> H264 encoder.

  7. -> video stream.

    The whole sequence of tasks ranges from 300 to 500ms.
    The issue I have is text stamped over the images after converted
    and scaled looks not so clear. I wonder order at items: 4. then .5 or .5 then.4

答案1

得分: 1

请注意,RGB数据很可能是非线性的(例如,在sRGB格式中),理想情况下,您需要:

  1. 将非线性的“R'G'B'”数据转换为线性的RGB(请注意,这需要更高的每通道位精度)(参见维基百科上的函数规范
  2. 应用您的降尺度滤镜
  3. 将线性结果转换回非线性的R'G'B'(即sRGB)
  4. 将其转换为YCbCr/NV12

理想情况下,您应该始终在线性空间中进行滤波/混合/着色。为了直观地为您解释这一点,在线性颜色空间中,黑色(0)和白色(255)的平均值将约为128,但在sRGB中,这个中灰色表示为(如果我没记错的话)186。因此,如果您在sRGB空间中进行数学运算,您的结果将看起来异常暗淡/浑浊。

(如果您着急,有时可以只使用平方(和sqrt())作为将sRGB转换为线性(反之亦然)的一种权宜之计/技巧)

英文:

Noting that the RGB data is very likely to be non-linear (e.g. in an sRGB format) ideally you need to

  1. Convert from the non-linear "R'G'B'" data to linear RGB (Note this needs higher bit precision per channel) (see function spec on wikipedia)
  2. Apply your downscaling filter
  3. Convert the linear result back to non-linear R'G'B' (ie. sRGB)
  4. Convert this to YCbCr/NV12

Ideally you should always do filtering/blending/shading in linear space. To give you an intuitive justification for this, the average of black (0) and white (255) in linear colour space will be ~128 but in sRGB this mid grey is represented as (IIRC) 186. If you thus do your maths in sRGB space, your result will look unnaturally dark/murky.

(If you are in a hurry, you can sometimes get away with just using squaring (and sqrt()) as a kludge/hack to convert from sRGB to linear (and vice versa))

答案2

得分: 1

为了避免空间插值的两个阶段,建议按以下顺序进行操作:

  1. 将RGBA转换为YUV444(YCbCr),不进行调整大小。
  2. 调整Y通道至目标分辨率。
  3. 将U(Cb)和V(Cr)通道在每个轴上调整到一半的分辨率。
    结果格式为输出图像的YUV420分辨率。
  4. 将数据打包为NV12(NV12是特定数据排序的YUV420格式)。
    如果效率是关注点,可以在单次处理中进行调整和NV12打包。

如果不进行YUV444转换,U和V通道会被插值两次:

  • 第一次插值在缩小RGBA时进行。
  • 第二次插值在将U和V转换为420格式时按一半缩小时进行。

在缩小图像时,建议在缩小之前模糊图像(有时称为“抗锯齿”滤镜)。

备注:由于眼睛对色彩分辨率的敏感性较低,除非图像具有精细分辨率的图形,如彩色文本,否则您可能不会看到任何明显的差异。

备注:

  • Simon的答案在颜色准确性方面更为准确。
    在大多数情况下,您不太可能看到差异。
  • 转换为NV12时会丢失伽马信息。

更新:关于“转换和缩放后图像上的文字不够清晰”:

如果获得清晰的文本是主要问题,建议按以下步骤进行操作:

  1. 缩小BGRA。
  2. 印上文本(使用较小的字体)。
  3. 转换为NV12。

对带有印有文本的图像进行降采样将导致文本不清晰。

更好的解决方案是在缩小后使用较小的字体盖印文本。

现代字体使用矢量图形,而不是光栅图形,因此使用较小的字体盖印文本比缩小后带有盖印文本的图像效果更好。

NV12格式是YUV420,U和V通道在每个轴上缩小2倍,因此与RGB或YUV444格式相比,文本质量会较低。
在图像中编码文本也会损害文本。

对于字幕,解决方案是在单独的流中附加字幕,并在解码视频后添加文本。

英文:

For avoiding two phases of spatial interpolation the following order is recommended:

  1. Convert RGBA to YUV444 (YCbCr) without resizing.
  2. Resize Y channel to your destination resolution.
  3. Resize U (Cb) and V (Cr) channels to half resolution in each axis.
    The result format is YUV420 in the resolution of the output image.
  4. Pack the data as NV12 (NV12 is YUV420 in specific data ordering).
    It is possible to do the resize and NV12 packing in a single pass (if efficiency is a concern).

In case you don't do the conversion to YUV444, U and V channels are going to be interpolated twice:

  • First interpolation when downscaling RGBA.
  • Second interpolation when U and V are downscaled by half when converting to 420 format.

When downscaling the image it's recommended to blur the image before downscaling (sometimes referred as "anti-aliasing" filter).

Remark: since the eye is less sensitive to chromatic resolution, you are probably not going to see any visible difference (unless image has fine resolution graphics like colored text).

Remarks:

  • Simon answer is more accurate in terms of color accuracy.
    In most cases you are not going to see the difference.
  • The gamma information is lost when converting to NV12.

Update: Regarding "Text stamped over the images after converted and scaled looks not so clear":

In case getting clear text is the main issue, the following stages are suggested:

  1. Downscale BGRA.
  2. Stamp text (using smaller font).
  3. Convert to NV12.

Downsampling an image with stamped text, is going to result unclear text.

A better solution is to stamp a test with smaller font, after downscaling.

Modern fonts uses vectored graphics, and not raster graphics, so stamping text with smaller font gives better result than downscaled image with stamped text.

NV12 format is YUV420, the U and V channels are downscaled by a factor of x2 in each axis, so the text quality will be lower compared to RGB or YUV444 format.
Encoding image with text is also going to damage the text.

For subtitles the solution is attaching the subtitles in a separate stream, and adding the text after decoding the video.

huangapple
  • 本文由 发表于 2020年1月3日 15:19:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574621.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定