可以获得转义字符在Lua中由f:read()提供的字符串中仍然按原样工作吗?

huangapple go评论59阅读模式
英文:

Can I get scape characters still behave as such for a string provided by f:read() in Lua?

问题

我正在为我的脚本开发一个简单的本地化功能,尽管目前它已经开始运作得相当不错,但我不知道如何避免在将小部件用f:read()返回的字符串填充后,在UI中显示转义/特殊字符作为文本的一部分。

例如,如果在某个Strings.ES.txt的行中,我有:Ignorar \"Etiquetas de capa\",我希望反斜杠不会像当我用正常双引号之间的普通字符串填充小部件时那样显示出来,比如:"Ignorar \"Etiquetas de capa\"",或者至少有一种方法可以避免它。我已经尝试了tostring()load()函数以及不同的(肯定是无意义的 🙄)连接方式,比如:load(tostring("[[" .. f:read()" .. ]]"))等等,但都没有成功,所以我又来问了...

有人知道是否有办法让f:read()返回的字符串中的转义字符仍然像在常规字符串中找到时一样特殊吗?

英文:

I'm working on a simple localization function for my scripts and, although it's starting to work quite well so far, I don't know how to avoid scape/special characters to be shown in UI as part of the text after feeding the widgets with the strings returned by f:read().

For example, if in a certain Strings.ES.txt's line I have: Ignorar \"Etiquetas de capa\", I'd expect backslashes didn't end showing up just like when I feed the widget with a normal string between doble quotes like: "Ignorar \"Etiquetas de capa\"", or at least have a way to avoid it. I've been trial-and-erroring with tostring() and load() functions and different (surely nonsense 🙄) concatenations like: load(tostring("[[" .. f:read()" .. ]]")) and such without any success, so here I'm again...

Do someone know if there is a way to get scape characters in a string returned by f:read() still behave as special as when they are found in a regular one?

答案1

得分: 1

以下是您要翻译的内容:

I don't know how to avoid [e]scape/special characters to be shown in UI as part of the text

你想要做的是“取消转义”或“取消引用”一个字符串,以解释转义序列,就好像它被Lua解析为带引号的字符串一样。

[...] with the strings returned by f:read() [...]

这个字符串是使用f:read()获取的事实可以忽略;重要的是它是一个没有引号的字符串文字,使用带引号的字符串转义。

I've been trial-and-erroring with tostring() and load() functions and different [...] concatenations like: load(tostring(""[["" .. f:read() .. "]]")) and such without any success [...]

这几乎是正确的方法,除了你选择了错误的字符串文字类型:使用方括号([])括起的“长”字符串根本不解释转义序列;它们用于在Lua程序中包含长、原始、可能跨多行的字符串,通常在需要表示带反斜杠的文字字符串时很有用(例如正则表达式 - 不要与Lua模式混淆,Lua模式使用%进行转义,并且缺少正则表达式的基本交替运算符)。

如果你使用单引号或双引号来包装字符串,它将正常工作:

local function unescape_string(escaped)
    return assert(load(('return "%s"'):format(escaped)))()
end

这将为每个字符串生成一个小的Lua程序(一个“块”),它只包含return "<内容>"。请注意,Lua块只是函数。因此,您可以简单地调用函数来获取它返回的字符串的值。这样,Lua将为我们解释转义序列。通常也使用相同的方法来使用Lua读取序列化为Lua代码的数据。

还要注意使用assert进行错误处理:如果存在语法错误,load会返回nil, err。为了优雅地处理这种情况,我们可以将对load的调用包装在assert中:如果其第一个参数(由load返回的块)为真值,assert将返回该值;否则,如果它为假值(在这种情况下为nil),assert将引发错误,使用其第二个参数作为错误消息。如果省略assert,并且您的输入导致语法错误,您将得到一个晦涩的“尝试调用空值”错误。

您可能还希望进行额外的验证,特别是如果这些转义字符串是由用户提供的 - 否则,像str"; os.execute("...")这样的恶意字符串可以轻松地触发远程代码执行(RCE)漏洞,使其能够执行Lua,例如阻止(while 1 do end),减慢或劫持您的应用程序,以及使用os.execute执行shell命令。要防范这种情况,仅搜索未转义的闭引号应该足够了(语法错误,例如通过无效的转义仍然是可能的,但除非Lua解释器出现错误,否则不应该出现RCE):

local function unescape_string(escaped)
    -- 匹配零个或多个反斜杠后跟双引号的序列的开头和结尾
    for from, to in escaped:gmatch'()\\*()' do
        -- 前面的反斜杠数量必须是奇数,双引号才会被转义
        assert((to - from) % 2 ~= 0, "未转义的双引号")
    end
    return assert(load(('return "%s"'):format(escaped)))()
end

或者,更健壮(但也更复杂)和可能更高效的取消转义方法是通过string.gsub 手动 实现转义序列;这样你可以获得完全控制,这对于用户提供的输入更合适

-- 根据参考手册实现Lua 5.1的单字符反斜杠转义序列:https://www.lua.org/manual/5.1/manual.html#2.1
local escapes = {a = '\a', b = '\b', f = '\b', n = '\n', r = '\r', t = '\t', v = '\v', ['\\'] = '\\', ['"'] = '"', ["'"] = "'"}
local function unescape_string(escaped)
    return escaped:gsub("\\(.)", escapes)
end

你可以根据需要在这里实现转义;例如,这里没有包含十进制转义,可以轻松实现为escaped:gsub("\\(%d%d?%d?)", string.char)(这使用了string.char中字符串到数字的强制转换,并且将函数作为string.gsub的第二个参数)。

这个函数最终可以直接用作unescape_string(f:read())

英文:

> I don't know how to avoid [e]scape/special characters to be shown in UI as part of the text

What you want is to "unescape" or "unquote" a string to interpret escape sequences as if it were parsed as a quoted string by Lua.

> [...] with the strings returned by f:read() [...]

The fact that this string was obtained using f:read() can be ignored; all that matters is that it is a string literal without quotes using quoted string escapes.

> I've been trial-and-erroring with tostring() and load() functions and different [...] concatenations like: load(tostring(&quot;[[&quot; .. f:read()&quot; .. ]]&quot;)) and such without any success [...]

This is almost how to do it, except you chose the wrong string literal type: "Long" strings using pairs square brackets ([ and ]) do not interpret escape sequences at all; they are intended for including long, raw, possibly multiline strings in Lua programs and often come in handy when you need to represent literal strings with backslashes (e.g. regular expressions - not to be confused with Lua patterns, which use % for escapes, and lack the basic alternation operator of regular expressions).

If you instead use single or double quotes to wrap the string, it will work fine:

local function unescape_string(escaped)
    return assert(load((&#39;return &quot;%s&quot;&#39;):format(escaped)))()
end

this will produce a tiny Lua program (a "chunk") for each string, which just consists of return &quot;&lt;contents&gt;&quot;. Recall that Lua chunks are just functions. Thus you can simply call the function to obtain the value of the string it returns. That way, Lua will interpret the escape sequences for us. The same approach is often used to use Lua for reading data serialized as Lua code.

Note also the use of assert for error handling: load returns nil, err if there is a syntax error. To deal with this gracefully, we can wrap the call to load in assert: assert returns its first argument (the chunk returned by load) if it is truthy; otherwise, if it is falsy (e.g. nil in this case), assert errors, using its second argument as an error message. If you omit the assert and your input causes a syntax error, you will instead get a cryptic "attempt to call a nil value" error.

You probably want to do additional validation, especially if these escaped strings are user-provided - otherwise a malicious string like str&quot;; os.execute(&quot;...&quot;) can trivially invoke a remote code execution (RCE) vulnerability, allowing it to both execute Lua e.g. to block (while 1 do end), slow down or hijack your application, as well as shell commands using os.execute. To guard against this, searching for an unescaped closing quote should be sufficient (syntax errors e.g. through invalid escapes will still be possible, but RCE should not be possible excepting Lua interpreter bugs):

local function unescape_string(escaped)
    -- match start &amp; end of sequences of zero or more backslashes followed by a double quote
    for from, to in escaped:gmatch&#39;()\\*()&quot;&#39; do
        -- number of preceding backslashes must be odd for the double quote to be escaped
        assert((to - from) % 2 ~= 0, &quot;unescaped double quote&quot;)
    end
    return assert(load((&#39;return &quot;%s&quot;&#39;):format(escaped)))()
end

Alternatively, a more robust (but also more complex) and presumably more efficient way of unescaping this would be to manually implement escape sequences through string.gsub; that way you get full control, which is more suitable for user-provided input:

-- Single-character backslash escapes of Lua 5.1 according to the reference manual: https://www.lua.org/manual/5.1/manual.html#2.1
local escapes = {a = &#39;\a&#39;, b = &#39;\b&#39;, f = &#39;\b&#39;, n = &#39;\n&#39;, r = &#39;\r&#39;, t = &#39;\t&#39;, v = &#39;\v&#39;, [&#39;\\&#39;] = &#39;\\&#39;, [&quot;&#39;&quot;] = &quot;&#39;&quot;, [&#39;&quot;&#39;] = &#39;&quot;&#39;}
local function unescape_string(escaped)
    return escaped:gsub(&quot;\\(.)&quot;, escapes)
end

you may implement escapes here as you see fit; for example, this misses decimal escapes, which could easily be implemented as escaped:gsub(&quot;\\(%d%d?%d?)&quot;, string.char) (this uses coercion of strings to numbers in string.char and a replacement function as second argument to string.gsub).

This function can finally be used straightforwardly as unescape_string(f:read()).

huangapple
  • 本文由 发表于 2023年1月9日 10:26:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052675.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定