如何在Timing Analyzer中正确计算设备的频率,Intel Quartus

huangapple go评论55阅读模式
英文:

How to correctly calculate the frequency of the device in Timing Analyzer, Intel Quartus

问题

I have 3 modules: modulo remainder generator, modulo adder, and modulo Wallace adder. Their speeds are related as follows:

  • Remainder Modulo:

    • Maximum frequency: 165.65 MHz
    • Start node: cnt[0]
    • End node: reduce_modulo:reduce|multidigitAdder:multAdd|Add1~8_OTERM9
    • Slack: 16.642
    • Data delay: 3.31
  • Wallace Adder Modulo:

    • Maximum frequency: 136.59 MHz
    • Start node: cnt[0]
    • End node: adder_modulo_wallace:addWallaceMod|S[3]~3_OTERM9
    • Slack: 17.084
    • Data delay: 2.75
  • Adder Modulo:

    • Maximum frequency: 165.65 MHz
    • Start node: cnt[0]
    • End node: adder_modulo:addMod|multidigitAdder:multAdd2|Add1~6_OTERM9
    • Slack: 18.076
    • Data delay: 1.875

But Timing Analyzer, as far as I understand, gives me the frequency of the device, but that's not what I need. I want to know the real-time delay so that the speeds correlate the way they should. What are the specifications I need to rely on?

英文:

I have 3 modules: modulo remainder generator, modulo adder and modulo Wallace adder. Their speeds are related as follows: remainder_modulo > wallace_adder_modulo > modulo_adder. But Timing Analyzer as far as I understand gives me the frequency of the device, but that's not what I need. I want to know the real time delay so that the speeds correlate the way they should. What are the specifications I need to rely on?

module remainder_modulo
#(parameter n)
(
	input wire [n-1:0] A, 
	input wire [n-1:0] P, 
	output wire [n:0] S,  
	output Po			
);
	wire [n:0] A_factor = {A, 1'b0};
	wire [n:0] P_extended = {1'b0, P};
	wire [n:0] S_temp;
	multidigitAdder #(.n(n+1)) multAdd(.A(A_factor), .B(P_extended), .Pi(1'b1), .S(S_temp), .Po(Po));
	assign S = Po ? S_temp : A_factor; 
endmodule

module adder_modulo
#(parameter n)
(
	input wire [n-1:0] A,
	input wire [n-1:0] B,
	input wire [n-1:0] P,
	output wire [n-1:0] S,
	output Po 				
);
	wire [n-1:0] S_temp, S_temp_mod;
	multidigitAdder #(.n(n)) multAdd1(.A(A), .B(B), .Pi(1'b0), .S(S_temp));
	multidigitAdder #(.n(n)) multAdd2(.A(S_temp), .B(P), .Pi(1'b1), .S(S_temp_mod), .Po(Po));
	assign S = Po ? S_temp_mod : S_temp;
endmodule

module adder_wallace
#(parameter n)
(
	input wire [n-1:0] A,  
	input wire [n-1:0] B, 
	input wire [n-1:0] P,  
	input Pi,			
	output wire [n-1:0] S, 
	output Po				
);
	wire [n-1:0] S_arr, Po_arr;
	genvar i;
	generate
		for (i = 0; i < n; i = i + 1) begin : MEM
			bitAdder adder(A[i], B[i], P[i], S_arr[i], Po_arr[i]);
		end
	endgenerate

	wire [n:0] multi_B_arr = {Po_arr, Pi};
	wire [n:0] multi_A_arr = {1'b0, S_arr};
	multidigitAdder #(.n(n + 1)) mAdder(.A(multi_A_arr), .B(multi_B_arr), .Pi(1'b0), .S(S), .Po(Po));
endmodule

module adder_modulo_wallace
#(parameter n)
(
	input wire [n-1:0] A,
	input wire [n-1:0] B,
	input wire [n-1:0] P,
	output wire [n-1:0] S,
	output Po			
);
	wire [n-1:0] simpleSum, wallaceSum;
	multidigitAdder #(.n(n)) multAdd1(.A(A), .B(B), .Pi(0), .S(simpleSum));
	adder_wallace #(.n(n)) add(.A(A), .B(B), .P(P), .Pi(1), .S(wallaceSum), .Po(Po));
	assign S = Po ? wallaceSum : simpleSum;
endmodule

module multidigitAdder
#(parameter n)
(
	input wire [n-1:0] A,
	input wire [n-1:0] B,
	input Pi,
	output wire [n-1:0] S,
	output Po
);
	assign {Po, S} = A + B + Pi;
endmodule

remainder_modulo:

  • Maximum frequency - 165.65 Mhz
  • Start node: cnt[0]
  • End node: reduce_modulo:reduce|multidigitAdder:multAdd|Add1~8_OTERM9
  • Slack: 16.642
  • Data delay: 3.31

wallace_adder_modulo:

  • Maximum frequency: 136.59 Mhz
  • Start node: cnt[0]
  • End node: adder_modulo_wallace:addWallaceMod|S[3]~3_OTERM9
  • Slack: 17.084
  • Data delay: 2.75

adder_modulo:

  • Maximum frequency: 165.65 Mhz
  • Start node: cnt[0]
  • End node: adder_modulo:addMod|multidigitAdder:multAdd2|Add1~6_OTERM9
  • Slack: 18.076
  • Data delay: 1.875

答案1

得分: 0

最大频率参数是性能的限制因素。
发布的代码将被实现为组合逻辑,其最大延迟为给定模块的1/最大频率。

如果模块作为单个时钟同步系统的一部分实现,那么系统的最大时钟频率将由最慢的模块控制,即 wallace_adder_module,频率为136.59 MHz。
从该系统的任何模块获得新样本的延迟为1/136.59 MHz = 7.3212 ns。

考虑一个由多个工作站组成的工人装配线;线路的性能限制因素是最慢的工作站。

FPGA定时工具没有报告预期、实际或平均延迟。没有理论延迟。工具报告最大值,以便设计师可以选择最大时钟频率。在同步设计中的假设是逻辑每个时钟周期产生1个逻辑结果。

这是Vivado定时分析器报告延迟的选项菜单。其他供应商的工具类似。

理论延迟可以根据映射到理论门延迟来手动推断,但是FPGA不针对门延迟(它们针对供应商的宏块),因此这些模型不在FPGA工具的范围之内。

由于Vivado提供了最小延迟,您可以将最小值加上最大值的一半作为典型值;但是我不会依赖这个数字,除非作为一种思维实验。

看起来您单独合成了这些模块,没有顶层模块将它们组合在一起。当它们一起综合时,硬件实现和性能数据会发生显著变化,因为它们被组合在一起。同样的情况会发生在具有寄存器/触发器的同步系统中。

如果您想更好地了解延迟的性质,可以打开工具的RTL视图,仔细查看逻辑是如何映射到供应商的硬件的。

在FPGA设计中,没有必要尝试调整模块之间的延迟。将模块放入同步系统中,以便每个模块被寄存器/触发器包围,系统将在每个时钟边沿产生新的答案。

英文:

The parameter Maximum Frequency is limiting factor on performance.
The posted code will implement as combinational logic whose max delay is 1/Maximum Frequency for the given module.

If the modules are implemented as part of a single clock synchronous system, then the max clock rate of the system will be is controlled by the slowest module which is the wallace_adder_module at 136.59 MHz.
The delay to obtain a new sample from any module in that system is 1/136.59 MHz = 7.3212 ns.

Consider an assembly line of workers consisting of multiple workstations; the performance limiting factor of the line is the slowest station.

There is no expected, actual, or average delay reported by fpga timing tools. There is no theoretical delay. The tools report the maximum so that designers can select a maximum clock frequency. If the delay thru the logic is > than the clock frequency, the design does not work. The assumption in synchronous design is that the logic produce 1 logical result per clock cycle.
Here is the options menu for reporting delays in Vivado's timing analyzer. Other vendors will be similar.
如何在Timing Analyzer中正确计算设备的频率,Intel Quartus

A theoretical delay could be manually postulated based on mapping to theoretical gate delays, however fpga's don't target gates (they target the vendors macro blocks) so those models don't exist in the scope of fpga tools.

Since Vivado provides min delays, you could take min + max/2 as a typical; however I would not rely on that number in any way other than as a thought experiment.

It looks like you synthesized the modules separately without any top level module to bring them together. The hardware implementation & performance numbers will change significantly when they are synthesized together because of combining. Same will happen when combined into a synchronous system with registers/flip flops.

If you want to understand the nature of the delays better, open the tools RTL view and take a close look at how the logic got mapped to the vendors hardware.

There is no need to attempt to align delays between modules for fpga design. Put modules in a synchronous system so that each module is surrounded by registers/ff's and the system acts as if each modules produces a new answer every clock edge.

huangapple
  • 本文由 发表于 2023年2月10日 03:20:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75403450.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定