有没有调用子例程与内联代码相比有显著的性能损耗?

huangapple go评论60阅读模式
英文:

Is there a significant performance penalty for calling a subroutine vs inline code?

问题

Old school question - 我不知道Perl的运行效率如何。

我有一组if/elsif/else语句,用于处理报告中不同数据类型的数据行。我发现如果我使用子程序调用而不是不定大小的内联代码块,代码更容易阅读和查看。

一些旧的编程语言在处理调用时存在显著的性能惩罚,因此将短小的例程内联而不是调用它们会更快。我不需要在调用时传递变量。正在读取的行包含所有数据,重要的项目会被放入变量以供以后处理。

可能不是必须掌握的关键知识,但我尽量使我的程序既高效又可读。我已经按照最常见的情况对if语句进行了排序。

我已经将代码写成内联形式。我没有实际测量它运行速度的方法。由于它每15分钟处理一次事务,我不想占用文件更长的时间。

英文:

Old school question - I don't know how efficient the running of Perl is.

I have a group of if/elsif/else statements that process lines of data for different data types that come in the report.
I find it easier to read and look at the code if I use subroutine calls instead of variable sized chunks of inline code.

Some older languages had significant penalties in handling the calls so that is was faster to include short routines inline rather than call them.
I do not need to pass variables with the call. The line being read has all the data and the significant items are put in variables to be processed later.

Probably not a critical bit of knowledge to have but I try to make my programs efficient and be readable.
I already have the chain of ifs so that the most common ones are checked first.

I have written the code inline. I have no way to actually measure how fast it runs. Since it processes things every 15 minutes, I don't want to tie up the files anymore than necessary.

答案1

得分: 6

Perl子例程的调用在大多数编程语言中比函数调用有更多的开销,但与诸如文件IO、网络访问等其他开销相比,它仍然是相对较小的。除非您要调用函数数千次,否则内联它将带来很少的优势。

你提到的一件事引起了我的注意:

> 我不需要在调用时传递变量。

这实际上为您提供了一个重要的优化可能性。

与其将处理该行的函数称为 foo(),不如将其称为 &foo(带有&符号且不带括号)。当以这种方式调用函数时,这是向Perl发出的信号,表示函数不接受参数,因此Perl将避免设置新的@_数组作为调用的一部分。这可以节省一些开销。

(副作用是函数现在将能够看到其调用者的@_数组!)

英文:

Perl sub calls have a lot more overhead than function calls in most programming languages, however it's still a relatively small amount of overhead compared to things like file IO, network accesses, etc. Unless you're calling a function many thousands of times, there's going to be little advantage to inlining it.

One thing you mentioned stands out to me:

> I do not need to pass variables with the call.

This actually gives you a significant optimization possibility.

Instead of calling your function which handles the line as foo(), call it as &foo (with an ampersand and no parentheses). When a function is called in that way, this is a signal to Perl that the function takes no parameters, so Perl will avoid setting up a new @_ array as part of the call. This saves a bit of overhead.

(A side-effect is that the function will now be able to see its caller's @_ array!)

答案2

得分: 4

这是一个非常一般性的问题,没有具体的代码,只能用一般性的术语回答。

是的,函数调用相对较慢,但这只在大规模扩展时才会表现出来,比如O(n^2)复杂度和/或递归调用。

通常,通过标准机制改进算法比微小优化调用要好得多。

例如,在你的示例中,使用"dispatch table"而不是"if/elsif/else"链听起来更好,复杂度为O(1)而不是O(n)

"Memoizing"也是一个可考虑的方法,以避免通过缓存结果进行重复调用。

所以,当有疑问时,首先要做的事情是进行性能基准测试。

例如,在调用本身和连接的堆栈/帧机制之外的一个主要开销是从@_复制和声明/初始化变量,特别是数组。

因此,使用常见的"closure vars"可能会更快。

顺便说一下,与你的假设相反,内联和可读性并不矛盾。

你始终可以使用eval来从较小的片段构建复杂的代码。文本处理毕竟是Perl的长项。

CPAN上也有更舒适的"宏"机制可用。

=== 更新

我过去使用的另一种避免琐碎函数的技巧是使用goto LABEL语句。

特别是在你的场景中,使用"if/elsif/else"链,所有片段的通常返回点都在链的末尾的END_LABEL:

与内联的优势是重复的片段不会导致臃肿的代码,而且没有明显的性能损失。

只有在基准测试证明它们是必需的情况下才使用这些技术!

英文:

This is a very general question without concrete code, which can only be answered in general terms.

Yes function calls are relatively slow, but this only shows when you scale massively, like with O(n^2)++ complexity and/or recursive calls.

Normally improving the algorithm with standard mechanisms pays off far better than micro optimizing the calls.

For instance in your example using a "dispatch table" $call{$type}->() instead of if/elsif/else chains sounds better, O(1) vs O(n)

"Memoizing" also comes to mind to avoid repeated calls via cached results.

So the first thing that you should do, when in doubt is benchmarking.

For instance one major overhead beside the call itself and connected stack/frame mechanics is copying from @_ and declaring/initializing variables, especially arrays.

So using common "closure vars" could prove to be much faster.

FWIW, contrary to your assumption are inlining and readablity not opposed to each other.

You can always use eval to construct complex code from smaller snippets. Text manipulation is Perl's forte after all.

There are also more comfortable "macro" mechanisms available on CPAN.

=== update

Another technique I used in the past to avoid trivial functions are goto LABEL statements.

Especially in your scenario with if/elsif/else chains where the usual point of return for all snippets is a END_LABEL: behind the chain.

The advantage over inlining is that repeated snippets won't lead to bloated code, and this without noticeable performance penalty.

Again only use those techniques after the benchmarks prove them necessary!

答案3

得分: 3

Perl的函数调用在性能上相对较昂贵。这已经有详细的讨论了,但最终你可以进行基准测试。鉴于此处没有提供具体细节,这是一个非常基本的示例,可以根据需要进行调整和扩展。我只测试带参数的子例程,因为我不能轻松地生成关于子例程调用开销的现实代码,而没有任何参数。

use warnings;
use strict;
use feature 'say';

use Benchmark qw(cmpthese);

my $count = shift // 10e6;   # 迭代次数; 对于较小的值,测试代码过于简单

sub has_caps {                               # 非常简单的测试代码
    return $_[0] =~ /[A-Z]/;
}

my $str = join '', 'a'..'z', 'A', 'a'..'z';  # 测试字符串

sub inline {                                 # 从更高的作用域中“看到”字符串
    my $flag = $str =~ /[A-Z]/;
}

sub w_sub_args { 
    my $flag = has_caps($str);
}

die "不等!(", w_sub_args(), ' vs ', inline(), ")" 
    if w_sub_args() ne inline();

cmpthese( $count, {
    w_sub_args => sub { w_sub_args() },
    inline     => sub { inline() },
})

这将打印出类似以下的结果:

                Rate w_sub_args     inline
w_sub_args 2518892/s         --       -41%
inline     4273504/s        70%         --

在一台空闲的服务器上,旧桌面上的比例非常相似。

存在差异,但这是很多次运行,1000万次。 (代码如此简单,以至于对于更少的调用次数,甚至是1百万次,都会收到警告“迭代次数太少,无法可靠计数”。)
所以每次调用的差异微不足道,约为0.16微秒(1/2518892 - 1/4273504),即使对于运行时间的微小节省也需要很多调用(大约600万次调用才能节省一秒钟的时间)。

请注意,此处的简单代码强调了开销(即使启动正则表达式引擎也不是那么便宜)。 如果在子例程中有更多的工作,相对差异较小。 另一方面,我从不拆开上述参数,而是直接使用$_[0],这会减小开销。 (但对于性能敏感的简单代码,这可能是一种值得考虑的做法。)

因此,请将一些现实的代码放入其中,以及一个现实的迭代次数。 如果确实有一种不需要输入的函数来执行此操作,只需将其添加到基准测试中。

但总之,这是否值得呢?真正的代码有多少次调用,总运行时间是多少?毫秒可能很重要,但微秒不重要,至少在脚本语言中是这样。

请记住,在进行基准测试时,确保(严格地)“苹果与苹果”进行比较可能有些棘手。 感谢ikegami的评论。


一些一般的基准测试注意事项。

通常更方便使用cmpthese( -$runfor, ... ),其中$runfor是要运行测试的秒数(通常是3或10足够了)。

事实证明,将基准测试运行为

cmpthese( $count, {
    w_sub_args => \&w_sub_args,
    inline     => \&inline,
})

会导致更大的差异,约为80%。 (我们可以在这里这样做,因为子例程不需要参数。)虽然不是决定性的因素,但如果基准测试自己的开销影响结果,那么无需该开销更加准确。 (我认为结果是在没有开销的情况下计算的,但也许不能完全考虑所有内容?)


那段代码将使用哪些数据?

如果该子例程从外部获取数据,通过文件、网络或套接字等,那很容易会远远超过函数调用开销,可能会有数量级的差异。

如果它依赖于在其作用域中“看到”的数据,那通常是不好的做法,通常是不明智的想法。首先,这容易引发可能微妙的错误,尤其是随着以后调整代码时。

英文:

Perl's function calls are costly, comparatively. A good discussion of that has already been offered, but then ultimately you can benchmark.

Given that no specifics are given here's a very basic example, adjust and expand as suitable. I only test a sub with arguments because I can't easily conjure realistic code, for which we wonder about the sub call overhead, without any arguments.<sup>&dagger;</sup>

use warnings;
use strict;
use feature &#39;say&#39;;

use Benchmark qw(cmpthese);

my $count = shift // 10e6;   # iterations; test code too simple for lesser no

sub has_caps {                               # very simple test code
    return $_[0] =~ /[A-Z]/;
}
   
my $str = join &#39;&#39;, &#39;a&#39;..&#39;z&#39;, &#39;A&#39;, &#39;a&#39;..&#39;z&#39;;  # test string

#sub w_sub_noargs { }                        # a realistic example?

sub inline {                                 # &quot;see&quot; string from higher scope
    my $flag = $str =~ /[A-Z]/;
}

sub w_sub_args { 
    my $flag = has_caps($str);
}

die &quot;Unequal! (&quot;, w_sub_args(), &#39; vs &#39;, inline(), &quot;)&quot; 
    if w_sub_args() ne inline();

cmpthese( $count, {
    w_sub_args =&gt; sub { w_sub_args() },
    inline     =&gt; sub { inline() },
})

This prints

                Rate w_sub_args     inline
w_sub_args 2518892/s         --       -41%
inline     4273504/s        70%         --

on a quiet server, and a very similar ratio on an older desktop.

There is a difference, but this is a lot of runs, 10 million. (The code is so light that for a lesser number of calls, even 1e6, I get a warning too few iterations for a reliable count.)
So the difference per call is minuscule, ~0.16<code>&mu;s</code> (1/2518892 - 1/4273504), and it does take a lot of calls even for a small saving in runtime (~6 million calls for one second difference).

Note that simple code here emphasizes the overhead (even as starting the regex engine isn't all that cheap). With more work in the sub the relative difference is lesser. On the other hand, I never unpack the arguments above but directly use $_[0], and that reduces the overhead. (But this is something one may want to do, for simple code with performance concerns.)

So put some realistic code in there, with a realistic number of iterations. If there is indeed a way to write a function for it that doesn't need input<sup>&dagger;</sup> then just add it to the benchmark.

But, in short -- is it worth it? Just how many calls does the real code make, and what is a total runtime? Saving milliseconds may matter but microseconds don't, not in a scripting language.

Keep in mind that making sure to compare (strictly) "apples to apples" in benchmarking can be tricky. Thanks to ikegami for comments.


A few general benchmarking notes.

It's often more convenient to use cmpthese( -$runfor, ... ), where $runfor has the number of seconds to run the test for (a 3 or 10 is often plenty
jjjj).

It turns out that running the benchmark as

cmpthese( $count, {
    w_sub_args =&gt; \&amp;w_sub_args,
    inline     =&gt; \&amp;inline,
})

results in a larger difference, about 80%. (We can do it here since the subs need no arguments.) Not a game changer, but if benchmark's own overhead affects results then it is more accurate to do it without that overhead. (I though that results are computed without the overhead but perhaps not everything can be accounted for?)


<sup>&dagger;</sup> What data would that code work with?

If the sub takes data from outside, via a a file or network or a socket or such, then that easily dwarfs a function call overhead, probably by order(s) of magnitude.

If it relies on data "seen" in its scope then that's in general bad practice and usually indeed a bad idea. For one, it's prone to possibly subtle errors, specially as the code is tweaked later.

huangapple
  • 本文由 发表于 2023年4月13日 21:47:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76006223.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定