2023年5月10日 22:40:51go评论147阅读模式

英文:

Drawing with Alpha Blending, It's normal that its so slow or I'm doing something wrong?

问题

Here is the translated code portion:

我正在尝试进行Alpha混合和显示图片，
我在CodeBlocks上使用了64位GCC 5.0.0（不知道具体版本，但是是2016年的）。

我没有使用GPU，一切都在2.4 GHz CPU的单个核心上进行。
渲染与输入和程序逻辑共享该CPU核心，但这些不会占用太多资源。

我有下面这个函数，用于指定X和Y坐标以及要绘制的SurfaceData。
还有一个可选的qAlpha值，用于控制整个图像的半透明度。

忽略我的愚蠢的“越界保护”（前4个if语句）-除非你有一些建议。

我的问题是关于这个函数的速度，这是我第一次进行Alpha混合。
我的程序以1360 x 768的分辨率进行渲染。

在屏幕上什么都不绘制（只清除和显示）时，我大约有4000 FPS。
在绘制UI和一些调试文本后，我得到约1300 FPS。
一旦我绘制第一个Alpha混合的图像，我失去了约500 FPS。

我使用的图片大小为283x600像素。

当我尝试绘制100张图片（一张在另一张之上）时，我的FPS下降到约20。

我想知道我是否做错了什么，我真的希望这能快得多。

请注意，我只提供了翻译的代码部分，不包括问题部分。如果您有任何关于代码或性能的具体问题，可以继续提问。

英文:

so I'm trying to make Alpha blending and display picture,
I'm using 64 big GCC 5.0.0 on CodeBlocks ( don't know which version but from 2016 )

I'm not using GPU, everything is happening on single core of 2.4 GHz CPU
Rendering is sharing that CPU Core with input and program logic, but these don't take much.

I have this function below to take X and Y coordinates where to draw, and SurfaceData what to draw.
There is also optional value qAlpha if we want whole picture to be semi transparent.

Ignore my dumb "out of bounds protection" ( first 4 If statements ) - unless u have some suggestion.

My question is about speed of this function, It's first time I'm doing alpha blending
My program is rendering in resolution of 1360 x 768

Without drawing anything on screen ( only clearing and displaying ) i have around 4000 FPS
After drawing UI and some debug text i get around 1300 FPS
As soon as i draw FIRST Alpha blended picture, i lose ~500 FPS

Picture I'm using is 283x600 px big

When i try to draw 100 pictures ( one on top of another ) my FPS drops to ~20

I would like to know If I'm doing something wrong, I was really expecting this to be a lot faster.

	void qDrawPictureAlpha( int _x,int _y, qlSurfaceData &amp;_Data, double _qAlpha=1.0 ){
		int w = _Data.RenderXSize,
			h = _Data.RenderYSize,
			t = w*h;
		if(_x&lt;0)return;
		if(_x+w&gt;qData.RenderXSize)return;
		if(_y&lt;0)return;
		if(_y+h&gt;qData.RenderYSize)return;
		
		int TargetP = _x+(_y*qData.RenderXSize)-1;
		
		unsigned int SrcPixel;
		unsigned int SrcAlpha;
		unsigned int InvAlpha;
		unsigned int TgtPixel;
		
		unsigned int *Source = &amp;_Data.qPixel[0];
		unsigned int *Target = &amp;qData.qPixel[0];
		Target+=TargetP;
		
		w--;t++;
		unsigned int Temp = w;
		
		for(int c=1;c&lt;t;c++){
			SrcPixel = *Source++;
			TgtPixel = *Target;
			SrcAlpha = ((SrcPixel &gt;&gt; 24) &amp; 0xFF)*_qAlpha;
			InvAlpha = 255 - SrcAlpha;
			
			*Target = 
	(((((SrcPixel &gt;&gt; 16) &amp; 0xFF) * SrcAlpha) &gt;&gt; 8 ) + ((((TgtPixel &gt;&gt; 16) &amp; 0xFF) * InvAlpha) &gt;&gt; 8 ) &lt;&lt; 16 ) | 
	(((((SrcPixel &gt;&gt; 8) &amp; 0xFF) * SrcAlpha) &gt;&gt; 8 ) + ((((TgtPixel &gt;&gt; 8) &amp; 0xFF) * InvAlpha) &gt;&gt; 8 ) &lt;&lt; 8 ) | 
	(((SrcPixel &amp; 0xFF) * SrcAlpha ) &gt;&gt; 8 ) + (((TgtPixel &amp; 0xFF) * InvAlpha) &gt;&gt; 8 );
	
			if(!Temp){Temp=w;Target+=qData.RenderXSize-w;}else{
				Temp--;Target++;
			}
		}
	}

As You could see in code I tried many things, like for example instead of doing multiplication/division by 255 im just shifting bits.

I was also thinking about PreCalculating all possible 65k pixel/alpha blends and put them to array, but It didn't really gave me much performance.

I Just want to know if i should start thinking about CUDA or something for GPU to render more, or is there something i can do to current code to be able to render at least 250 pictures on CPU with Alpha Blending.

答案1

得分: 2

这似乎大致正确，考虑到你没有使用任何技巧、SIMD或GPU。283像素宽乘以600像素高乘以100张图像乘以20帧每秒，意味着你的一个核心每秒计算339,600,000像素。我至少计算了每个像素29次计算。每秒2.4亿个时钟周期，每秒100亿次计算，真的不知道还能期望什么了。这是如果编译器没有找到以更少的计算方式完成一部分工作的方法的话。

如果你没有打开优化选项，请打开它，速度应该会快很多。根据我刚刚提到的数字，我猜你已经打开了它。

你的主要选择是减少工作量。你真的有100张重叠在一起，每秒变化20次的图像吗？我不太相信。你只需要在像素实际变化时计算像素颜色，你知道的。

你真的需要执行 ((Src * SrcAlpha) >> 8) + ((Dst * InvAlpha) >> 8) 吗？或者如果你执行 Dst + ((Src-Dst)*SrcAlpha) >> 8，也许会更快，少一个乘法和一个位移，并且可能有一点舍入误差。

你的另一个主要选择是使用GPU，它专为这个任务设计，具有并行处理和特殊处理单元。

还有SIMD。你可以减少每个像素的计算数量。你应该能够在单个256位像素计算中同时处理8个像素的红、绿和蓝。SIMD编程有些繁琐。

英文:

It seems about right, considering that you are not using any tricks, SIMD, or the GPU. 283 pixels wide times 600 pixels high times 100 images times 20 fps means your one core is calculating 339,600,000 pixels per second. I count at least 29 calculations per pixel. 2.4 billion clock cycles per second, 10 billion calculations per second, not sure what more you could expect, really. That's if the compiler hasn't figured out a way to do some of it with less calculations.

If you don't have optimization turned on, turn it on and it should get a lot faster. Based on the numbers I just cited, I guess that you did already turn it on.

Your main option is to do less work. Do you really have 100 images on top of each other that change 20 times per second? I don't believe that. You only need to calculate the pixel colour when it actually changes, you know.

Do you really need to do ((Src * SrcAlpha) >> 8) + ((Dst * InvAlpha) >> 8)? Or is it perhaps faster if you do Dst + ((Src-Dst)*SrcAlpha) >> 8, with one less multiplication and one less bitshift, and possibly a little bit more rounding error?

Your other main option is to use the GPU, which is designed for this task with parallel processing and special processing units.

There's also SIMD. You can reduce the number of calculations per pixel. You should be able to process the red, green and blue of 8 pixels all at once in a single 256-bit pixel calculation. SIMD programming is fiddly.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Drawing with Alpha Blending, It’s normal that it’s so slow or I’m doing something wrong?

问题

答案1

为什么在R中的as.factor()函数如此缓慢，能否改进？

在性能方面，我是否需要避免使用追加操作？

Reassigning std::bind using auto: which compiler gets it right?

使用C++中的`memcmp`比较哈希摘要。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论