计算MCU在运行过程中的负载时间(或空闲时间)

huangapple go评论69阅读模式
英文:

Calculate MCU load (or free) time during operation

问题

我有一个Cortex M0+芯片(STM32品牌),我想要计算负载(或空闲)时间。M0+没有DWT->SYSCNT寄存器,所以不能使用它。

这是我的想法:

使用我有的调度器,在我的空闲循环中,我取一个计数器并将其增加1。

uint32_t counter = 0;

while(1){
    sched_run();
}

sched_run(){
    if( Jobs_timer_ready(jobs) ){
        // 执行定时任务
    }else{
        sched_idle();
    }
}

sched_idle(){
    counter += 1;
}

我有一个在50微秒定时器上运行的任务,所以我可以准确地每100毫秒收集计数。对于一个64MHz的芯片,这将给我每秒64000000条指令,或者64条指令/微秒。

如果我取计数器使用的指令数量,并从每100毫秒的总指令中减去它,我应该有一个关于我的负载时间(或空闲时间)的概念。我在数学方面进展较慢,但应该是每100毫秒6400000条指令。我实际上没有查看会花费多少指令,但为了说明这个过程,让我们慷慨地假设递增计数器需要7条指令。

所以,假设counter变量在100毫秒后达到了12475。我们的公式应该是[CPU空闲%] = 空闲时间/最大时间 = COUNT*COUNT_INSTRUC/MAX_INSTRUC。

这将得出12475 * 7/6,400,000 = 87,325/6,400,00 = 0.013644(乘以100)= 1.36% 空闲(这是我数学看起来非常错误的地方)。

我的目标是拥有一个在现场可以计算的基本准确的负载百分比。特别是如果我将它交给其他人,或者需要检查它的性能。我不能总是在工作台上重现现场条件。

我的基本问题是:

  1. 我如何确定负载或空闲?
  2. 我是否可以像任务管理器一样计算负载/空闲(总体)?
  3. 我需要一个调度器吗,还是只需要一个定时器?
英文:

I have a Cortex M0+ chip (STM32 brand) and I want to calculate the load (or free) time. The M0+ doesn't have the DWT->SYSCNT register, so using that isn't an option.

Here's my idea:

Using a scheduler I have, I take a counter and increment it by 1 in my idle loop.

uint32_t counter = 0;

while(1){
    sched_run();
}

sched_run(){
    if( Jobs_timer_ready(jobs) ){
        // do timed jobs
    }else{
        sched_idle();
    }
}

sched_idle(){
    counter += 1;
}

I have jobs running on a 50us timer, so I can collect the count every 100ms accurately. With a 64mhz chip, that would give me 64000000 instructions/sec or 64 instructions/usec.

If I take the number of instructions the counter uses and remove that from the total instructions per 100ms, I should have a concept of my load time (or free time). I'm slow at math, but that should be 6,400,000 instructions per 100ms. I haven't actually looked at the instructions that would take but lets be generous and say it takes 7 instructions to increment the counter, just to illustrate the process.

So, let's say the counter variable has ended up with 12,475 after 100ms. Our formula should be [CPU Free %] = Free Time/Max Time = COUNT*COUNT_INSTRUC/MAX_INSTRUC.

This comes out to 12475 * 7/6,400,000 = 87,325/6,400,00 = 0.013644 (x 100) = 1.36% Free (and this is where my math looks very wrong).

My goal is to have a mostly-accurate load percentage that can be calculated in the field. Especially if I hand it off to someone else, or need to check how it's performing. I can't always reproduce field conditions on a bench.

My basic questions are this:

  1. How do I determine load or free?
  2. Can I calculate load/free like a task manager (overall)?
  3. Do I need a scheduler for it or just a timer?

答案1

得分: 2

我建议设置一个定时器以1微秒的步长计数(或者您需要的分辨率)。然后,在进行工作之前和之后,只需读取计数器的值以获取持续时间。

根据您简化的程序,看起来您只有一个while循环和一个指示何时需要进行一些工作的标志。因此,您可以这样做:

uint32_t busy_time = 0;
uint32_t idle_time = 0;
uint32_t idle_start = 0;

while (1) {
    // 初始化空闲开始计时器。
    idle_start = TIM2->CNT;
    sched_run();
}

void sched_run()
{
    if (Jobs_timer_ready(jobs)) {
        // 当工作开始时,计算空闲期间的持续时间。
        idle_time += TIM2->CNT - idle_start;

        // 测量工作持续时间。
        uint32_t job_start = TIM2->CNT;
        // 执行定时工作
        busy_time += TIM2->CNT - job_start;

        // 重新启动空闲期间。
        idle_start = TIM2->CNT;
    }
}

负载百分比将是(busy_time / (busy_time + idle_time)) * 100

英文:

I would recommend to set up a timer to count in 1 us step (or whatever resolution you need). Then just read the counter value before and after the work to get the duration.

Given your simplified program, it looks like you just have a while loop and a flag which tells you when some work needs to be done. So you could do something like this:

uint32_t busy_time = 0;
uint32_t idle_time = 0;
uint32_t idle_start = 0;

while (1) {
    // Initialize the idle start timer.
    idle_start = TIM2->CNT;
    sched_run();
}

void sched_run()
{
    if (Jobs_timer_ready(jobs)) {
        // When the job starts, calculate the duration of the idle period.
        idle_time += TIM2->CNT - idle_start;

        // Measure the work duration.
        uint32_t job_started = TIM2->CNT;
        // do timed jobs
        busy_time += TIM2->CNT - job_start;

        // Restart idle period.
        idle_start = TIM2->CNT;
    }
}

The load percentage would be (busy_time / (busy_time + idle_time)) * 100.

答案2

得分: 0

计算周期并不像看起来那么容易。从RAM读取变量,修改它并写回RAM具有非确定性的持续时间。RAM读取通常需要2个周期,但也可能需要3个,这取决于许多因素,包括AXIM总线的拥塞程度(其他MCU外设也连接到它)。写入则是另一回事。有可缓冲写入、不可缓冲写入等等。此外,还存在缓存,根据执行代码的位置、修改数据的位置以及数据和指令的缓存策略,情况会有所不同。还有一个问题,就是编译器生成了什么样的代码。因此,这个问题应该从不同的角度来解决。

我同意@Armandas的看法,最佳解决方案是使用硬件定时器。甚至不需要将其设置为微秒级别或其他(但你完全可以这样做)。你可以选择何时重置计数器。即使它以CPU时钟或接近CPU时钟的速度运行,32位溢出也会花费很长时间(但仍然必须处理;我会在进行空闲/忙碌计算时重置计时器计数器,这似乎是一个合理的时机;如果你的程序实际上可能在运行时溢出定时器,那么你需要提出修改后的解决方案来处理它)。显然,如果你的定时器有16位预分频器和计数器,你需要进行调整。在考虑之外的备选方案:DTCM存储器 - 小型紧密耦合的RAM - 实际上具有严格的一周期读/写访问,根据定义它不可缓存,也不可缓冲。因此,通过使用紧密耦合的内存以及对编译器生成的指令和CPU执行的指令进行严格控制,你可以更加确定性地处理变量计数器。但是,如果将该代码移植到M7,可能会出现与时序相关的问题,因为M7具有双问题流水线(如果非常简化,它可以同时执行2条指令,有关更多信息,请参阅架构参考手册)。只要记住这一点。这变得与架构更相关。这对你可能是个问题,也可能不是。

总而言之,我建议坚持使用硬件定时器。使其与变量一起工作会带来巨大的麻烦,你确实需要深入到架构层次来使其正常工作,即使这样,也总会有一些你忘记或没有考虑到的东西。对于手头的任务来说,似乎过于复杂了。硬件定时器才是王道。

英文:

Counting cycles isn't as easy as it seems. Reading variable from RAM, modifying it and writing it back has non-deterministic duration. RAM read is typically 2 cycles, but can be 3, depending on many things, including how congested AXIM-bus is (other MCU peripherals are also attached to it). Writing is a whole another story. There are bufferable writes, non-bufferable writes, etc etc. Also, there is caching, which changes things depending on where the executed code is, where the data it's modifying is, and cache policies for data and instructions. There is also an issue of what exactly your compiler generates. So this problem should be approached from a different angle.

I agree to @Armandas that the best solution is a hardware timer. You don't even have to set it up to a microsecond or anything (but you totally can). You can choose when to reset counter. Even if it runs at CPU clock or close to that, 32-bit overflow will take very long (but still must be handled; I would reset the timer counter when I make idle/busy calculation, seems like a reasonable moment to do that; if your program can actually overflow the timer at runtime, you need to come up with modified solution to account for it of course). Obviously, if your timer has 16-bit prescaler and counter, you will have to adjust for that. Microsecond tick seems like a reasonable compromise for your application after all.

Alternative things to consider: DTCM memory - small tightly coupled RAM - has actually strictly one cycle read/write access, it's by definition not cacheable and not bufferable. So with tightly coupled memory and tight control over what exactly instructions are being generated by compiler and executed by CPU, you can do something more deterministic with variable counter. However, if that code is ported to M7, there may be timing-related issues there because of M7's dual issue pipeline (if very simplified, it can execute 2 instructions in parallel at a time, more in Architecture Reference Manual). Just bear this in mind. It becomes a little more architecture dependent. It may or may not be an issue for you.

At the end of the day, I vote stick with hardware timer. Making it work with variable is a huge headache, and you really need to get down to architecture level to make it work properly, and even then there could always be something you forgot/didn't think about. Seems like massive overcomplication for the task at hand. Hardware timer is the boss.

huangapple
  • 本文由 发表于 2023年1月6日 14:09:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75027535.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定