2023年2月10日 03:20:50go评论81阅读模式

英文:

How to correctly calculate the frequency of the device in Timing Analyzer, Intel Quartus

问题

I have 3 modules: modulo remainder generator, modulo adder, and modulo Wallace adder. Their speeds are related as follows:

Remainder Modulo:
- Maximum frequency: 165.65 MHz
- Start node: cnt[0]
- End node: reduce_modulo:reduce|multidigitAdder:multAdd|Add1~8_OTERM9
- Slack: 16.642
- Data delay: 3.31
Wallace Adder Modulo:
- Maximum frequency: 136.59 MHz
- Start node: cnt[0]
- End node: adder_modulo_wallace:addWallaceMod|S[3]~3_OTERM9
- Slack: 17.084
- Data delay: 2.75
Adder Modulo:
- Maximum frequency: 165.65 MHz
- Start node: cnt[0]
- End node: adder_modulo:addMod|multidigitAdder:multAdd2|Add1~6_OTERM9
- Slack: 18.076
- Data delay: 1.875

But Timing Analyzer, as far as I understand, gives me the frequency of the device, but that's not what I need. I want to know the real-time delay so that the speeds correlate the way they should. What are the specifications I need to rely on?

英文:

I have 3 modules: modulo remainder generator, modulo adder and modulo Wallace adder. Their speeds are related as follows: remainder_modulo > wallace_adder_modulo > modulo_adder. But Timing Analyzer as far as I understand gives me the frequency of the device, but that's not what I need. I want to know the real time delay so that the speeds correlate the way they should. What are the specifications I need to rely on?

module remainder_modulo
#(parameter n)
(
	input wire [n-1:0] A, 
	input wire [n-1:0] P, 
	output wire [n:0] S,  
	output Po			
);
	wire [n:0] A_factor = {A, 1&#39;b0};
	wire [n:0] P_extended = {1&#39;b0, P};
	wire [n:0] S_temp;
	multidigitAdder #(.n(n+1)) multAdd(.A(A_factor), .B(P_extended), .Pi(1&#39;b1), .S(S_temp), .Po(Po));
	assign S = Po ? S_temp : A_factor; 
endmodule
module adder_modulo
#(parameter n)
(
	input wire [n-1:0] A,
	input wire [n-1:0] B,
	input wire [n-1:0] P,
	output wire [n-1:0] S,
	output Po 				
);
	wire [n-1:0] S_temp, S_temp_mod;
	multidigitAdder #(.n(n)) multAdd1(.A(A), .B(B), .Pi(1&#39;b0), .S(S_temp));
	multidigitAdder #(.n(n)) multAdd2(.A(S_temp), .B(P), .Pi(1&#39;b1), .S(S_temp_mod), .Po(Po));
	assign S = Po ? S_temp_mod : S_temp;
endmodule
module adder_wallace
#(parameter n)
(
	input wire [n-1:0] A,  
	input wire [n-1:0] B, 
	input wire [n-1:0] P,  
	input Pi,			
	output wire [n-1:0] S, 
	output Po				
);
	wire [n-1:0] S_arr, Po_arr;
	genvar i;
	generate
		for (i = 0; i &lt; n; i = i + 1) begin : MEM
			bitAdder adder(A[i], B[i], P[i], S_arr[i], Po_arr[i]);
		end
	endgenerate
	wire [n:0] multi_B_arr = {Po_arr, Pi};
	wire [n:0] multi_A_arr = {1&#39;b0, S_arr};
	multidigitAdder #(.n(n + 1)) mAdder(.A(multi_A_arr), .B(multi_B_arr), .Pi(1&#39;b0), .S(S), .Po(Po));
endmodule
module adder_modulo_wallace
#(parameter n)
(
	input wire [n-1:0] A,
	input wire [n-1:0] B,
	input wire [n-1:0] P,
	output wire [n-1:0] S,
	output Po			
);
	wire [n-1:0] simpleSum, wallaceSum;
	multidigitAdder #(.n(n)) multAdd1(.A(A), .B(B), .Pi(0), .S(simpleSum));
	adder_wallace #(.n(n)) add(.A(A), .B(B), .P(P), .Pi(1), .S(wallaceSum), .Po(Po));
	assign S = Po ? wallaceSum : simpleSum;
endmodule
module multidigitAdder
#(parameter n)
(
	input wire [n-1:0] A,
	input wire [n-1:0] B,
	input Pi,
	output wire [n-1:0] S,
	output Po
);
	assign {Po, S} = A + B + Pi;
endmodule

remainder_modulo:

Maximum frequency - 165.65 Mhz
Start node: cnt[0]
End node: reduce_modulo:reduce|multidigitAdder:multAdd|Add1~8_OTERM9
Slack: 16.642
Data delay: 3.31

wallace_adder_modulo:

Maximum frequency: 136.59 Mhz
Start node: cnt[0]
End node: adder_modulo_wallace:addWallaceMod|S[3]~3_OTERM9
Slack: 17.084
Data delay: 2.75

adder_modulo:

Maximum frequency: 165.65 Mhz
Start node: cnt[0]
End node: adder_modulo:addMod|multidigitAdder:multAdd2|Add1~6_OTERM9
Slack: 18.076
Data delay: 1.875

答案1

得分: 0

最大频率参数是性能的限制因素。
发布的代码将被实现为组合逻辑，其最大延迟为给定模块的1/最大频率。

如果模块作为单个时钟同步系统的一部分实现，那么系统的最大时钟频率将由最慢的模块控制，即 wallace_adder_module，频率为136.59 MHz。
从该系统的任何模块获得新样本的延迟为1/136.59 MHz = 7.3212 ns。

考虑一个由多个工作站组成的工人装配线；线路的性能限制因素是最慢的工作站。

FPGA定时工具没有报告预期、实际或平均延迟。没有理论延迟。工具报告最大值，以便设计师可以选择最大时钟频率。在同步设计中的假设是逻辑每个时钟周期产生1个逻辑结果。

这是Vivado定时分析器报告延迟的选项菜单。其他供应商的工具类似。

理论延迟可以根据映射到理论门延迟来手动推断，但是FPGA不针对门延迟（它们针对供应商的宏块），因此这些模型不在FPGA工具的范围之内。

由于Vivado提供了最小延迟，您可以将最小值加上最大值的一半作为典型值；但是我不会依赖这个数字，除非作为一种思维实验。

看起来您单独合成了这些模块，没有顶层模块将它们组合在一起。当它们一起综合时，硬件实现和性能数据会发生显著变化，因为它们被组合在一起。同样的情况会发生在具有寄存器/触发器的同步系统中。

如果您想更好地了解延迟的性质，可以打开工具的RTL视图，仔细查看逻辑是如何映射到供应商的硬件的。

在FPGA设计中，没有必要尝试调整模块之间的延迟。将模块放入同步系统中，以便每个模块被寄存器/触发器包围，系统将在每个时钟边沿产生新的答案。

英文:

The parameter Maximum Frequency is limiting factor on performance.
The posted code will implement as combinational logic whose max delay is 1/Maximum Frequency for the given module.

If the modules are implemented as part of a single clock synchronous system, then the max clock rate of the system will be is controlled by the slowest module which is the wallace_adder_module at 136.59 MHz.
The delay to obtain a new sample from any module in that system is 1/136.59 MHz = 7.3212 ns.

Consider an assembly line of workers consisting of multiple workstations; the performance limiting factor of the line is the slowest station.

There is no expected, actual, or average delay reported by fpga timing tools. There is no theoretical delay. The tools report the maximum so that designers can select a maximum clock frequency. If the delay thru the logic is > than the clock frequency, the design does not work. The assumption in synchronous design is that the logic produce 1 logical result per clock cycle.
Here is the options menu for reporting delays in Vivado's timing analyzer. Other vendors will be similar.

A theoretical delay could be manually postulated based on mapping to theoretical gate delays, however fpga's don't target gates (they target the vendors macro blocks) so those models don't exist in the scope of fpga tools.

Since Vivado provides min delays, you could take min + max/2 as a typical; however I would not rely on that number in any way other than as a thought experiment.

It looks like you synthesized the modules separately without any top level module to bring them together. The hardware implementation & performance numbers will change significantly when they are synthesized together because of combining. Same will happen when combined into a synchronous system with registers/flip flops.

If you want to understand the nature of the delays better, open the tools RTL view and take a close look at how the logic got mapped to the vendors hardware.

There is no need to attempt to align delays between modules for fpga design. Put modules in a synchronous system so that each module is surrounded by registers/ff's and the system acts as if each modules produces a new answer every clock edge.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Timing Analyzer中正确计算设备的频率，Intel Quartus

问题

答案1

在模块内部和模块外部声明类句柄的区别：

为什么我的有限状态机（FSM）不导致组合逻辑？

模块名称内的参数

Error: 赋值语句左侧的对象必须具有网络类型。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。