在大量的替换中消除重复的“在模式匹配中使用未初始化的值$foo”错误

huangapple go评论71阅读模式
英文:

Getting rid of repeated "Use of unititialized value $foo in pattern match" errors in big set of substitutions

问题

我有一个Perl脚本,从C源文件中构建一个符号哈希表,其中符号的长度超过X个字符,然后用字符串A[0-9]{5}进行替换。每个要缩短的连续符号的数字会递增。这是为了准备在非常旧的编译器上编译代码,这些编译器对符号长度有奇怪的限制。

在https://stackoverflow.com/questions/76544979中,我成功将旧的外部sed脚本方法转换为(大部分)直接在Perl中工作的方法。现在的问题是,我收到了大约1600次的以下投诉:

Use of uninitialized value within %transformations in substitution iterator at src/misc/snavig.pl line 210, <> line 8.

问题出现在这段代码中,它应用了这些替换:

  my $oldsymbols = join '|', keys %transformations;
  my $oldfilenames = join '|', keys %includes;
  local $^I = '.bak';
  local @ARGV = glob("*.c *.h");
  while (<>) {
    s/\b($oldsymbols)\b/$transformations{$1}/g;
    s/($oldfilenames)/$includes{$1}/g;
    print;
  }

%transformations哈希表的结构类似于:

hide_lines => 'A00058'
message => 'A00165'
tokenise_text => 'A00388'
z_print_addr => 'A00220'
z_ret_popped => 'A00236'
encode_text => 'A00384'

我已经追踪到问题出现在替换的第一部分,它匹配整行并将其放入$1中。所以当然,compression_names[compression_mode], hide_lines);不会作为哈希表中的键出现。我不明白的是如何保护$oldfilenames字符串,以便$1只接收该字符串中的一个符号。在%includes的替换中不会引发投诉,因为整行总是与一个键匹配。

我不记得在哪里找到建议,可以用\K ... (?=.)\K ... (?=.*)来保护它,无论是在\b内还是外。这些方法要么没有任何效果,要么弄乱了结果。有什么正确有效的方法可以消除这些投诉吗?

编辑以添加示例数据:

处理前:

/*
 * print_char
 *
 * High level output function.
 *
 */
void print_char(zchar c)
{
        static bool flag = FALSE;
        need_newline_at_exit = TRUE;

        if (message || ostream_memory || enable_buffering) {
                if (!flag) {
                        /* Characters 0 and ZC_RETURN are special cases */
                        if (c == ZC_RETURN) {
                                new_line();
                                return;
                        }

处理后:

/*
 * A00267
 *
 * High level output function.
 *
 */
void A00267(zchar c)
{
        static bool flag = FALSE;
        A00174 = TRUE;

        if (A00165 || A00162 || A00173) {
                if (!flag) {
                        /* Characters 0 and ZC_RETURN are special cases */
                        if (c == ZC_RETURN) {
                                A00266();
                                return;
                        }

在注释区域内进行替换是可以的。

英文:

I have a Perl script that builds a hash of symbols from C source files that are longer than X characters, then substitutes a string of A[0-9]{5}. The number increments with each successive symbol to be shortened. This is to prepare the code to be compiled on very old compilers with strange limits on the lengths of symbols.

In https://stackoverflow.com/questions/76544979, I was able to get the old external sed script approach translated to work (mostly) directly in Perl. The problem now is that I'm getting a complaint of

Use of uninitialized value within %transformations in substitution iterator at src/misc/snavig.pl line 210, &lt;&gt; line 8.

around 1600 times. The problem is in this chunk of code, which applies the substitutions:

  my $oldsymbols = join &#39;|&#39;, keys %transformations;
  my $oldfilenames = join &#39;|&#39;, keys %includes;
  local $^I = &#39;.bak&#39;;
  local @ARGV = glob(&quot;*.c *.h&quot;);
  while (&lt;&gt;) {
    s/\b($oldsymbols)\b/$transformations{$1}/g;
    s/($oldfilenames)/$includes{$1}/g;
    print;
  }

The %transformations hash looks something like this:

hide_lines =&gt; &#39;A00058&#39;
message =&gt; &#39;A00165&#39;
tokenise_text =&gt; &#39;A00388&#39;
z_print_addr =&gt; &#39;A00220&#39;
z_ret_popped =&gt; &#39;A00236&#39;
encode_text =&gt; &#39;A00384&#39;

I've tracked this down to the first part of the substitution matching on the entire line and it's put into $1. So of course compression_names[compression_mode], hide_lines); won't appear as a key in that hash. What I don't get is how to guard the $oldfilenames string such that $1 will only receive one of the symbols in that string. The substitutions in %includes doesn't provoke a complaint because the entire line will always match a key.

I don't remember where I found a suggestion to guard it with \K ... (?=.) or \K ... (?=.*) both in and out of the \b things. These either did nothing or made messes. What's the correct and effective way to get rid of those complaints?

Editing to add sample data:

Before processing:

/*
 * print_char
 *
 * High level output function.
 *
 */
void print_char(zchar c)
{
        static bool flag = FALSE;
        need_newline_at_exit = TRUE;

        if (message || ostream_memory || enable_buffering) {
                if (!flag) {
                        /* Characters 0 and ZC_RETURN are special cases */
                        if (c == ZC_RETURN) {
                                new_line();
                                return;
                        }

After processing:

/*
 * A00267
 *
 * High level output function.
 *
 */
void A00267(zchar c)
{
        static bool flag = FALSE;
        A00174 = TRUE;

        if (A00165 || A00162 || A00173) {
                if (!flag) {
                        /* Characters 0 and ZC_RETURN are special cases */
                        if (c == ZC_RETURN) {
                                A00266();
                                return;
                        }

Making substitutions within commented-out areas is fine.

答案1

得分: 2

你的帖子中缺少信息,或者其中的一些信息是不正确的。(如果这不能回答你的问题,你需要提供一个实际的问题演示。)

我猜测问题可能是你的键中有特殊符号。例如,考虑一个名为 foo* 的键。它将匹配 foo(以及其他键),而这可能不在你的哈希表中。

你可以通过添加以下内容来验证这个问题:

if ( my @errors = sort grep /\W/, keys %transformations ) {
   die(
      join "",
         map "Transformation key `$_` contains non-word symbols\n",
            @errors
   );
}

如果这是有意为之的,你可以将

my $oldsymbols = join '|', keys %transformations;

替换为

my $oldsymbols = join '|', map quotemeta, keys %transformations;

但请注意,如果一个键以非单词字符开头或结尾,锚定的 \b 将无法按预期工作。

英文:

There's information missing from your post, or some of the information is incorrect. (If this doesn't answer your question, you need to provide an actual demonstration of the problem.)

I'm guessing the issue is that you have keys with special symbols in them. For example, consider the case where you have a key named foo*. That will match foo (among others), which might not be in your hash.

You can verify that this is the issue by adding the following:

if ( my @errors = sort grep /\W/, keys %transformations ) {
   die(
      join &quot;&quot;,
         map &quot;Transformation key `$_` contains non-word symbols\n&quot;,
            @errors
   );
}

If that's intentional, you can replace

my $oldsymbols = join &#39;|&#39;, keys %transformations;

with

my $oldsymbols = join &#39;|&#39;, map quotemeta, keys %transformations;

But note that the anchoring \b won't behave as desired if a key starts or ends with a non-word character.

答案2

得分: 1

我已经追踪到问题所在,它出现在对整行进行替换匹配的第一部分,并且被放入了$1中。

你的正则表达式没有匹配到完整的行,除非你的真实代码中有一些你没有展示给我们的内容。

当我运行以下代码时,我没有看到任何问题:

my %transformations = (
    hide_lines    => 'A00058',
    message       => 'A00165',
    tokenise_text => 'A00388',
    z_print_addr  => 'A00220',
    z_ret_popped  => 'A00236',
    encode_text   => 'A00384',
);


my $oldsymbols = join '|', keys %transformations;
while (<DATA>) {
    s/\b($oldsymbols)\b/$transformations{$1}/g;
    print;
}


__END__

hide_lines
message
tokenise_text

compression_names[compression_mode], hide_lines);

它的输出是正确的:

$ perl try.pl 

A00058
A00165
A00388

compression_names[compression_mode], A00058);

请问你能否更新一下问题?

英文:

> I've tracked this down to the first part of the substitution matching on the entire line and it's put into $1.

Your regex isn't matching complete lines, unless there is something in your real code that you haven't shown us.

I don't see a problem when running this

my %transformations = (
    hide_lines    =&gt; &#39;A00058&#39;,
    message       =&gt; &#39;A00165&#39;,
    tokenise_text =&gt; &#39;A00388&#39;,
    z_print_addr  =&gt; &#39;A00220&#39;,
    z_ret_popped  =&gt; &#39;A00236&#39;,
    encode_text   =&gt; &#39;A00384&#39;,
  );


my $oldsymbols = join &#39;|&#39;, keys %transformations;
while (&lt;DATA&gt;) {
    s/\b($oldsymbols)\b/$transformations{$1}/g;
    print;
}


__END__

hide_lines
message
tokenise_text

compression_names[compression_mode], hide_lines);

It does the right thing

$ perl try.pl 

A00058
A00165
A00388

compression_names[compression_mode], A00058);

Can you update the question please?

huangapple
  • 本文由 发表于 2023年7月27日 15:01:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76777214.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定