
huangapple go评论57阅读模式

Scaling images before doing conversion or vice versa?



  1. 缩小 BGRA 图像,然后将其转换为 NV12/YV12 格式。
  2. 将 BGRA 图像转换为 NV12/YV12 格式,然后再缩小它们。



I wonder which one among methods below should preserve more details of images:

  1. Down scaling BGRA images and then converting them to NV12/YV12.
  2. Converting BGRA images to NV12/YV12 images and then down scaling them.

Thanks for your recommendation.

Updated 2020-02-04:

For my question is more clear, I want to desribe a little more.

The images is come from a video stream like this:

Video Stream

  1. -> decoded to YV12.

  2. -> converted to BGRA.

  3. -> stamped texts.

  4. -> scaling down (or YV12/NV12).

  5. -> YV12/NV12 (or scaling down).

  6. -> H264 encoder.

  7. -> video stream.

    The whole sequence of tasks ranges from 300 to 500ms.
    The issue I have is text stamped over the images after converted
    and scaled looks not so clear. I wonder order at items: 4. then .5 or .5 then.4


得分: 1


  1. 将非线性的“R'G'B'”数据转换为线性的RGB(请注意,这需要更高的每通道位精度)(参见维基百科上的函数规范
  2. 应用您的降尺度滤镜
  3. 将线性结果转换回非线性的R'G'B'(即sRGB)
  4. 将其转换为YCbCr/NV12




Noting that the RGB data is very likely to be non-linear (e.g. in an sRGB format) ideally you need to

  1. Convert from the non-linear "R'G'B'" data to linear RGB (Note this needs higher bit precision per channel) (see function spec on wikipedia)
  2. Apply your downscaling filter
  3. Convert the linear result back to non-linear R'G'B' (ie. sRGB)
  4. Convert this to YCbCr/NV12

Ideally you should always do filtering/blending/shading in linear space. To give you an intuitive justification for this, the average of black (0) and white (255) in linear colour space will be ~128 but in sRGB this mid grey is represented as (IIRC) 186. If you thus do your maths in sRGB space, your result will look unnaturally dark/murky.

(If you are in a hurry, you can sometimes get away with just using squaring (and sqrt()) as a kludge/hack to convert from sRGB to linear (and vice versa))


得分: 1


  1. 将RGBA转换为YUV444(YCbCr),不进行调整大小。
  2. 调整Y通道至目标分辨率。
  3. 将U(Cb)和V(Cr)通道在每个轴上调整到一半的分辨率。
  4. 将数据打包为NV12(NV12是特定数据排序的YUV420格式)。


  • 第一次插值在缩小RGBA时进行。
  • 第二次插值在将U和V转换为420格式时按一半缩小时进行。




  • Simon的答案在颜色准确性方面更为准确。
  • 转换为NV12时会丢失伽马信息。



  1. 缩小BGRA。
  2. 印上文本(使用较小的字体)。
  3. 转换为NV12。







For avoiding two phases of spatial interpolation the following order is recommended:

  1. Convert RGBA to YUV444 (YCbCr) without resizing.
  2. Resize Y channel to your destination resolution.
  3. Resize U (Cb) and V (Cr) channels to half resolution in each axis.
    The result format is YUV420 in the resolution of the output image.
  4. Pack the data as NV12 (NV12 is YUV420 in specific data ordering).
    It is possible to do the resize and NV12 packing in a single pass (if efficiency is a concern).

In case you don't do the conversion to YUV444, U and V channels are going to be interpolated twice:

  • First interpolation when downscaling RGBA.
  • Second interpolation when U and V are downscaled by half when converting to 420 format.

When downscaling the image it's recommended to blur the image before downscaling (sometimes referred as "anti-aliasing" filter).

Remark: since the eye is less sensitive to chromatic resolution, you are probably not going to see any visible difference (unless image has fine resolution graphics like colored text).


  • Simon answer is more accurate in terms of color accuracy.
    In most cases you are not going to see the difference.
  • The gamma information is lost when converting to NV12.

Update: Regarding "Text stamped over the images after converted and scaled looks not so clear":

In case getting clear text is the main issue, the following stages are suggested:

  1. Downscale BGRA.
  2. Stamp text (using smaller font).
  3. Convert to NV12.

Downsampling an image with stamped text, is going to result unclear text.

A better solution is to stamp a test with smaller font, after downscaling.

Modern fonts uses vectored graphics, and not raster graphics, so stamping text with smaller font gives better result than downscaled image with stamped text.

NV12 format is YUV420, the U and V channels are downscaled by a factor of x2 in each axis, so the text quality will be lower compared to RGB or YUV444 format.
Encoding image with text is also going to damage the text.

For subtitles the solution is attaching the subtitles in a separate stream, and adding the text after decoding the video.

  • 本文由 发表于 2020年1月3日 15:19:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574621.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
