2023年6月1日 09:28:17go评论68阅读模式

英文:

Troubleshooting Regular Expressions in MediaWiki Extension Not Working

问题

正则表达式功能在我正在构建的扩展的onParserBeforePreprocess函数中不起作用，而我不知道原因。

让我详细说明onParserBeforePreprocess函数不起作用的问题。

extension.json:

{
    "name": "EnhanceMarkup",
    "description": "提供增强标记功能",
    "version": "1.0",
    "author": [
        "Jeong Gaon"
    ],
    "url": "https://www.gaon.xyz/mw_extensions",
    "type": "other",
    "license-name": "Apache-2.0",
    "AutoloadClasses": {
        "EnhanceMarkupHooks": "includes/EnhanceMarkupHooks.php"
    },
    "ResourceModules": {
        "ext.EnhanceMarkup.styles": {
            "styles": "resources/ext.EnhanceMarkup.styles.css",
            "localBasePath": "",
            "remoteExtPath": "EnhanceMarkup"
        },
        "ext.EnhanceMarkup.scripts": {
            "scripts": ["resources/ext.EnhanceMarkup.scripts.js", "resources/lib/math.js"],
            "localBasePath": "",
            "remoteExtPath": "EnhanceMarkup"
        }
    },
    "Hooks": {
        "InternalParseBeforeLinks": "EnhanceMarkupHooks::onInternalParseBeforeLinks",
        "ParserFirstCallInit": "EnhanceMarkupHooks::onParserFirstCallInit",
        "BeforePageDisplay": "EnhanceMarkupHooks::onBeforePageDisplay"
    },
    "manifest_version": 2
}

includes/EnhanceMarkupHooks.php:

<?php
class EnhanceMarkupHooks
{
    public static function onBeforePageDisplay(OutputPage &$out, Skin &$skin)
    {
        $out->addModules("ext.EnhanceMarkup.styles");
        $out->addModules("ext.EnhanceMarkup.scripts");
        return true;
    }

    public static function onParserFirstCallInit(Parser $parser)
    {
        // 用解析器注册您的自定义解析器函数
        $parser->setHook("random", [self::class, "randomRender"]);

        return true;
    }

    public static function onInternalParseBeforeLinks(Parser &$parser, &$text)
    {
        // - * 4+ == <hr>
        // 用水平线替换3-9个'*'、'-'或'_'的序列
        $text = preg_replace('/^([-]{3,9})$/m', "<hr>", $text);

        // [pagecount]显示所有页面的数量
        // 用页面的总数替换[pagecount]
        $text = preg_replace_callback(
            "/\[pagecount\]/",
            function ($matches) use ($parser) {
                $dbr = wfGetDB(DB_REPLICA);
                $count = $dbr->selectRowCount("page");
                return $count;
            },
            $text
        );

        // 用<ref group="$1">$2</ref>替换[*A text]
        $text = preg_replace(
            "/\[\*\s+([^ ]+)\s+(.*?)\]/",
            '<ref group="$1">$2</ref>',
            $text
        );

        // 用<ref group="$1" />替换[*A]
        $text = preg_replace(
            "/\[\*\s+([^ ]+)\s*\]/",
            '<ref group="$1" />',
            $text
        );

        // 用<ref>$1</ref>替换[* text]
        $text = preg_replace("/\[\*\s+(.*?)\]/", '<ref>$1</ref>', $text);

        // 用{{$1}}替换[include text]
        $text = preg_replace("/\[\include\s+(.*?)\]/", '{{$1}}', $text);

        // 用<br>替换[br]
        $text = str_replace("[br]", "<br>", $text);

        // 字体大小增加{{{+1 (content) }}} - 范围：1~5
        $text = preg_replace_callback('/\{\{\{\+([1-5])\s*(.*?)\s*\}\}\}/s', function($matches) {
            return '<span style="font-size:'.(1 + $matches[1]).'em;">'.$matches[2].'</span>';
        }, $text);

        // 字体大小减小{{{-1 (content) }}} - 范围：1~5
        $text = preg_replace_callback('/\{\{\{-([1-5])\s*(.*?)\s*\}\}\}/s', function($matches) {
            return '<span style="font-size:'.(1 - $matches[1]/10).'em;">'.$matches[2].'</span>';
        }, $text);

        return true;
    }

    // 随机
    // <random range="50">True|False</random>
    public static function randomRender(
        $input,
        array $args,
        Parser $parser,
        PPFrame $frame
    ) {
        // 禁用缓存
        $parser->getOutput()->updateCacheExpiry(0);

        // 解析输入
        $parts = explode("|", $input);

        // 从参数中获取范围
        $range = isset($args["range"]) ? $args["range"] : 2; // 默认为2

        // 在范围内生成随机数
        $randomNumber = mt_rand(1, $range);

        // 根据随机数选择输出
        if ($randomNumber <= $range / 2) {
            // 如果随机数在范围的前一半，返回第一个部分
            return $parts[0];
        } else {
            // 否则，返回第二个部分（如果存在），否则返回第一个部分
            return isset($parts[1]) ? $parts[1] : $parts[0];
        }
    }
}

查看代码，似乎没有特别的问题 - 如果它应该工作，像[* texts]这样在维基中键入应该生成一个名为texts的脚注，但出于某种原因它输出文字。

例如，如果您键入'hello[br]world'，您应该在hello下面看到world，但什么也没有。

我的MediaWiki网站地址是https://www.gaonwiki.com

如果您需要更多信息，请告诉我。谢谢。

英文:

The regex feature of the onParserBeforePreprocess function doesn't work in the extension I'm building, and I don't know why.

Let me elaborate on the issue with the onParserBeforePreprocess function not working.

extension.json:

{
&quot;name&quot;: &quot;EnhanceMarkup&quot;,
&quot;description&quot;: &quot;Provides enhanced markup functionalities&quot;,
&quot;version&quot;: &quot;1.0&quot;,
&quot;author&quot;: [
&quot;Jeong Gaon&quot;
],
&quot;url&quot;: &quot;https://www.gaon.xyz/mw_extensions&quot;,
&quot;type&quot;: &quot;other&quot;,
&quot;license-name&quot;: &quot;Apache-2.0&quot;,
&quot;AutoloadClasses&quot;: {
&quot;EnhanceMarkupHooks&quot;: &quot;includes/EnhanceMarkupHooks.php&quot;
},
&quot;ResourceModules&quot;: {
&quot;ext.EnhanceMarkup.styles&quot;: {
&quot;styles&quot;: &quot;resources/ext.EnhanceMarkup.styles.css&quot;,
&quot;localBasePath&quot;: &quot;&quot;,
&quot;remoteExtPath&quot;: &quot;EnhanceMarkup&quot;
},
&quot;ext.EnhanceMarkup.scripts&quot;: {
&quot;scripts&quot;: [&quot;resources/ext.EnhanceMarkup.scripts.js&quot;, &quot;resources/lib/math.js&quot;],
&quot;localBasePath&quot;: &quot;&quot;,
&quot;remoteExtPath&quot;: &quot;EnhanceMarkup&quot;
}
},
&quot;Hooks&quot;: {
&quot;InternalParseBeforeLinks&quot;: &quot;EnhanceMarkupHooks::onInternalParseBeforeLinks&quot;,
&quot;ParserFirstCallInit&quot;: &quot;EnhanceMarkupHooks::onParserFirstCallInit&quot;,
&quot;BeforePageDisplay&quot;: &quot;EnhanceMarkupHooks::onBeforePageDisplay&quot;
},
&quot;manifest_version&quot;: 2
}

includes/EnhanceMarkupHooks.php:

&lt;?php
class EnhanceMarkupHooks
{
public static function onBeforePageDisplay(OutputPage &amp;$out, Skin &amp;$skin)
{
$out-&gt;addModules(&quot;ext.EnhanceMarkup.styles&quot;);
$out-&gt;addModules(&quot;ext.EnhanceMarkup.scripts&quot;);
return true;
}
public static function onParserFirstCallInit(Parser $parser)
{
// Register each of your custom parser functions with the parser
$parser-&gt;setHook(&quot;random&quot;, [self::class, &quot;randomRender&quot;]);
return true;
}
public static function onInternalParseBeforeLinks(Parser &amp;$parser, &amp;$text)
{
// - * 4+ == &lt;hr&gt;
// Replace sequences of 3-9 &#39;*&#39;, &#39;-&#39;, or &#39;_&#39; with a horizontal rule
$text = preg_replace(&#39;/^([-]{3,9})$/m&#39;, &quot;&lt;hr&gt;&quot;, $text);
// [pagecount] show all count of page
// Replace [pagecount] with the total number of pages
$text = preg_replace_callback(
&quot;/\[pagecount\]/&quot;,
function ($matches) use ($parser) {
$dbr = wfGetDB(DB_REPLICA);
$count = $dbr-&gt;selectRowCount(&quot;page&quot;);
return $count;
},
$text
);
// Replace [*A text] with &lt;ref group=&quot;A&quot;&gt;text&lt;/ref&gt;
$text = preg_replace(
&quot;/\[\*\s+([^ ]+)\s+(.*?)\]/&quot;,
&#39;&lt;ref group=&quot;$1&quot;&gt;$2&lt;/ref&gt;&#39;,
$text
);
// Replace [*A] with &lt;ref group=&quot;A&quot; /&gt;
$text = preg_replace(
&quot;/\[\*\s+([^ ]+)\s*\]/&quot;,
&#39;&lt;ref group=&quot;$1&quot; /&gt;&#39;,
$text
);
// Replace [* text] with &lt;ref&gt;text&lt;/ref&gt;
$text = preg_replace(&quot;/\[\*\s+(.*?)\]/&quot;, &#39;&lt;ref&gt;$1&lt;/ref&gt;&#39;, $text);
// Replace [include text] with {{text}}
$text = preg_replace(&quot;/\[\include\s+(.*?)\]/&quot;, &#39;{{$1}}&#39;, $text);
// Replace [br] with &lt;br&gt;
$text = str_replace(&quot;[br]&quot;, &quot;&lt;br&gt;&quot;, $text);
// Font Size up {{{+1 (content) }}} - Range: 1~5
$text = preg_replace_callback(&#39;/\{\{\{\+([1-5])\s*(.*?)\s*\}\}\}/s&#39;, function($matches) {
return &#39;&lt;span style=&quot;font-size:&#39;.(1 + $matches[1]).&#39;em;&quot;&gt;&#39;.$matches[2].&#39;&lt;/span&gt;&#39;;
}, $text);
// Font Size down {{{-1 (content) }}} - Range: 1~5
$text = preg_replace_callback(&#39;/\{\{\{-([1-5])\s*(.*?)\s*\}\}\}/s&#39;, function($matches) {
return &#39;&lt;span style=&quot;font-size:&#39;.(1 - $matches[1]/10).&#39;em;&quot;&gt;&#39;.$matches[2].&#39;&lt;/span&gt;&#39;;
}, $text);
return true;
}
// Random
// &lt;random range=&quot;50&quot;&gt;True|False&lt;/random&gt;
public static function randomRender(
$input,
array $args,
Parser $parser,
PPFrame $frame
) {
// Disable caching
$parser-&gt;getOutput()-&gt;updateCacheExpiry(0);
// Parse the input
$parts = explode(&quot;|&quot;, $input);
// Get the range from args
$range = isset($args[&quot;range&quot;]) ? $args[&quot;range&quot;] : 2; // default to 2
// Generate a random number within the range
$randomNumber = mt_rand(1, $range);
// Choose the output based on the random number
if ($randomNumber &lt;= $range / 2) {
// If the random number is in the first half of the range, return the first part
return $parts[0];
} else {
// Otherwise, return the second part if it exists, or the first part if it doesn&#39;t
return isset($parts[1]) ? $parts[1] : $parts[0];
}
}
}

Looking at the code, there doesn't seem to be anything particularly wrong with it - if it's supposed to work, typing something like [* texts] within the wiki should generate a footnote called texts, but for some reason it's outputting literally.

For example, if you type 'hello[br]world', you should see world under hello, but nothing.

My MediaWiki site address is https://www.gaonwiki.com

Let me know if you need any more information. I'll provide it. Thank you.

答案1

得分: 1

A) 为了匹配你所描述的 [*A Text] 引用，我会将模式更正如下：

/\[\*(?<group>\w+)\s+(?<text>[^\]]+)\]/

想法是使用命名捕获组，格式为 (?<group_name>...pattern...)，并且在 \w+ 中更加精确地匹配单词字符，然后使用 \s+ 匹配一个或多个空格，最后使用 [^\]]+ 匹配除了闭括号之外的任何字符。

替换变成了 <ref group="$group">$text</ref>

这里有一些测试链接：https://regex101.com/r/vueNcM/2

B) 步骤2，为了仅匹配 [*A]，我会使用 /\[\*(?<group>\w+)\]/，然后替换为 <ref group="$group" />

这里也有测试链接：https://regex101.com/r/gYFOzO/2

C) 步骤3，将 [* text] 替换为 <ref>text</ref>，我首先会使用 /\[\*\s+(?<text>[^\]]+)\]/，然后替换为 <ref>$text</ref>。

这里有测试链接：https://regex101.com/r/aYTOH9/1

但是，如果你想允许在文本中使用转义括号（以防用户需要在文本中包含一些括号），那么可以使用/\[\*\s+(?<text>(?:\\\]|[^\]])+)\]/。

测试链接：https://regex101.com/r/aYTOH9/2

对于这种情况，你将需要使用 preg_replace_callback() 而不是简单的 preg_replace()，因为我们需要取消转义括号：

$text = preg_replace_callback(
	'/\[\*\s+(?&lt;text&gt;(?:\\\\\]|[^\]])+)\]/',
	function ($matches) {
		return '&lt;ref&gt;' .
			preg_replace('/\\\\([\[\]])/', '$1', $matches['text']) .
			'&lt;/ref&gt;';
	},
	$text
);

在这里测试 PHP 代码：https://onlinephp.io/c/2b5249

创建过滤器时的安全问题

如果用户输入以下内容会发生什么？

Shit happens with [* &lt;script&gt;alert(&#39;I got you&#39;)&lt;/script&gt;]

是否需要另一个过滤器来防止跨站脚本攻击（XSS攻击）？

如果没有安全地转义，那么将所有的 preg_replace() 调用替换为 preg_replace_callback()，就像示例C）中那样，并在捕获的值上执行净化操作：

// 将 [* Some text] 替换为 &lt;ref&gt;Some text&lt;/ref&gt;
// 同样处理文本中的转义括号，例如 [* An \[important\] reference]
$text = preg_replace_callback(
	'/\[\*\s+(?&lt;text&gt;(?:\\\\\]|[^\]])+)\]/',
	function ($matches) {
		// 1) 取消转义 &quot;\[&quot; 和 &quot;\]&quot; 为 &quot;[&quot; 和相应的 &quot;]&quot;。
		// 2) 由于我们正在创建HTML，文本应该被转义，因为它可能包含一些内容，例如 &lt;strong&gt;Bold&lt;/strong&gt; 或更糟糕的一些JavaScript，例如 &lt;script&gt;alert(&#39;XSS攻击&#39;)&lt;/script&gt;。
		return '&lt;ref&gt;' .
			htmlspecialchars(
				preg_replace('/\\\\([\[\]])/', '$1', $matches['text'])
			) .
			'&lt;/ref&gt;';
	},
	$text
);

在这里测试 PHP 代码：https://onlinephp.io/c/8a7f8

英文:

A) To match your references described by [*A Text], I would correct
the pattern like this:

/\[\*(?<group>\w+)\s+(?<text>[^\]]+)\]/

The idea is to use named capturing groups with
(?<group_name>...pattern...)
and also to be a bit more precise with \w+ to match word characters,
then \s+ for one or several spaces and then any char which isn't the
closing bracket with [^\]]+.

The replacement becomes <ref group="$group">$text</ref>

Here are some tests of it: https://regex101.com/r/vueNcM/2

B) Step 2, to match only [*A], I would use
/\[\*(?<group>\w+)\]/ and replace it with <ref group="$group" />

Here are the tests too: https://regex101.com/r/gYFOzO/2

C) Step 3, to replace [* text] with <ref>text</ref>, I would
use first use /\[\*\s+(?<text>[^\]]+)\]/ and replace it by
<ref>$text</ref>.

Tests available here: https://regex101.com/r/aYTOH9/1

But if you want to allow escaped bracket in the text (in case
the user needs to have some brackets in the text, then use
/\[\*\s+(?<text>(?:\\\]|[^\]])+)\]/

Tests: https://regex101.com/r/aYTOH9/2

For this situation, you'll have to do a preg_replace_callback()
instead of a simple preg_replace() because we have to unescape
the brackets:

$text = preg_replace_callback(
	&#39;/\[\*\s+(?&lt;text&gt;(?:\\\\\]|[^\]])+)\]/&#39;,
	function ($matches) {
		return &#39;&lt;ref&gt;&#39; .
			preg_replace(&#39;/\\\\([\[\]])/&#39;, &#39;$1&#39;, $matches[&#39;text&#39;]) .
			&#39;&lt;/ref&gt;&#39;;
	},
	$text
);

Test the PHP here: https://onlinephp.io/c/2b5249

Security concerns when creating filters

What happens if the user inputs this?

Shit happens with [* &lt;script&gt;alert(&#39;I got you&#39;)&lt;/script&gt;]

Will there be another filter to avoid XSS attacks?

If it's not safely escaped, then replace all your preg_replace()
calls by a preg_replace_callback() like in example C) above and
do the sanitizing operations on the captured values:

// Replace [* Some text] by &lt;ref&gt;Some text&lt;/ref&gt;
// Also handle escaped brackets in text, such as [* An \[important\] reference]
$text = preg_replace_callback(
	// In the pattern, \ should be doubled, but only for known PHP escaped
	// sequences, such as \t, \n, \a, or \\. This makes the pattern below not
	// very readable :-( In JavaSript it would be simple like this:
	// /\[\*\s+(?&lt;text&gt;(?:\\\]|[^\]])+)\]/
	&#39;/\[\*\s+(?&lt;text&gt;(?:\\\\\]|[^\]])+)\]/&#39;,
	function ($matches) {
		// 1) Unescape &quot;\[&quot; or &quot;\]&quot; by &quot;[&quot; and respectively &quot;]&quot;.
		// 2) As we are creating HTML, the text should be sanitized as it may
		// contain some stuff like &lt;strong&gt;Bold&lt;/strong&gt; or worse some JavaScript
		// &lt;script&gt;alert(&#39;XSS attack&#39;)&lt;/script&gt;.
		return &#39;&lt;ref&gt;&#39; .
			htmlspecialchars(
				preg_replace(&#39;/\\\\([\[\]])/&#39;, &#39;$1&#39;, $matches[&#39;text&#39;])
			) .
			&#39;&lt;/ref&gt;&#39;;
	},
	$text
);

PHP code in action here: https://onlinephp.io/c/8a7f8

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

MediaWiki扩展中的正则表达式故障未生效。

问题

答案1

创建过滤器时的安全问题

Security concerns when creating filters

约束错误在更新 m2m 关系时发生。

wkhtmltopdf页眉和页脚间距不起作用

useDelimiter()打印空白处

如何在测试过程中通过Eloquent ORM模拟记录插入？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论