preg_replace在不应该有匹配项的地方找到了一个匹配项

huangapple go评论58阅读模式
英文:

Preg_replace finds a match where there shouldn't be one

问题

我正在做自己的简单Markdown格式化工具。我正在修复最后一个问题,但在处理我的代码块格式化器时遇到了一个问题。出于某种原因,它在不应该匹配任何内容的地方多次匹配。


$matches = [
    "```\ncode block \n```",
    "code block \n"
];

private function code_block_format($matches): string
    {
        // 获取一行
        $regex = '/([^\n]*)\n?/';
        // 将该行包装在<code>元素中并添加换行符
        $repl = '<code>$1</code>' . "\n";
        // 移除尾随的换行符和空格
        $matches[1] = trim($matches[1]);
        $ret = preg_replace($regex, $repl, $matches[1]); // 这会返回格式不正确的字符串
        $ret = "<pre>\n" . $ret . "</pre>";
        return $ret;
    }

preg_replace只返回<code>code block</code>\n,但出于某种原因,我多次获取额外的元素<code>code block</code>\n<code></code>\n

对于这种情况,有什么可能导致它附着在其中某处的空字符串的帮助吗?

编辑

我的目标是创建一个类似于您在此处编写的代码块元素,其中```标签之间可以有空行,因此应匹配只包含\n的行。

英文:

So I'm doing my own simple Markdown formatter. I'm fixing the last of the issues when I ran into an issue with my code block formatter. For some reason it matches an extra time where there shouldn't be anything to match.


$matches = [
    "```\ncode block \n```",
    "code block \n"
];

private function code_block_format($matches): string
    {
        // get a line
        $regex = '/([^\n]*)\n?/';
        // wrap that line into <code> elem + new line
        $repl = '<code>$1</code>' . "\n";
        // remove trailing linebreaks + spaces
        $matches[1] = trim($matches[1]);
        $ret = preg_replace($regex, $repl, $matches[1]); // this returns the badly formatted string
        $ret = "<pre>\n" . $ret . "</pre>";
        return $ret;
    }

The preg_replace just return <code>code block</code>\n but for some reason I get an extra element <code>code block</code>\n<code></code>\n

Any help on what in the world could be causing it to latch onto a "" string somewhere in there?

Edit

My goal is to make a codeblock element similar to what you can write here where there can be empty lines between the ``` tags, so lines with simply \n should be matched as well.

答案1

得分: 0

$regex = '/([^\n]*)\n?/'; 返回不包含 \n 零次或多次的字符串,基本上是所有内容。

* 更改为 +,表示它出现一次或多次。

$regex = '/([^\n]+)\n?/';

我实际上无法确定为什么 * 会返回第二个组。/[^a]*/g 对于不包含 a 的任何文本都返回两个组,但我期望只有一个。

尽管如此,您的代码似乎过于复杂。您只是想要从 $match[1] 中去除空格并用 <code></code> 包围它吗?

您可以直接将标记连接到修剪后的 $matches[1]

return '<code>' . $matches[1] . '</code';

英文:

$regex = '/([^\n]*)\n?/'; returns strings that do not contain \n zero or more times, so basically everything.

Change * to +, which means it occurs one or more times.

$regex = '/([^\n]+)\n?/';

I actually can't figure out exactly why * is returning the second group. /[^a]*/g returns two groups for any text that doesn't include an a, and I would expect one.

Although, your code seems needlessly complex. Are you just trying to remove white space from $match[1] with trim(), then surround it with <code></code>?

You can just concatenate the tags onto the trimmed $matches[1]:

return '<code>' . $matches[1] . '</code';

答案2

得分: 0

这个正则表达式可以尝试匹配零个或多个初始匹配,并且可以创建一个只有换行符的分组...

$regex = '/([^\n]+)\n?/';

它应该输出:

<pre>
<code>code block</code>
</pre>
英文:

You can try this since your initial is ZERO or more matches and can create a group where there is just a newline...

$regex = &#39;/([^\n]+)\n?/&#39;;

It should output:

&lt;pre&gt;
&lt;code&gt;code block&lt;/code&gt;
&lt;/pre&gt;

答案3

得分: 0

([^\n]+\n?|\n) 允许我捕获带有文本或空行的行,这符合我的要求。

英文:

Okay, got an idea from the answers and found the regex I works as I want. ([^\n]+\n?|\n) allows me to capture a line with text or the empty lines as I wanted.

huangapple
  • 本文由 发表于 2023年2月23日 22:33:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75546246.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定