2023年6月6日 00:32:51go评论83阅读模式

英文:

How do I return 3 characters either side of a null byte in Elixir?

问题

如果我有一个字符串，例如 hello this isa<<0>>string.，我要如何返回空字节两侧的三个字符，包括空字节，例如 isa<<0>>str？

我尝试了以下方式：

~r/(?&lt;=.{0,2})(.{3}).*?&lt;&lt;0&gt;&gt;(.{3})(?=.{0,2})/

英文:

If I have a string, for example, hello this isa<<0>>string., how do I return the three characters either side of the null byte, including the null byte, e.g., isa<<0>>str?

I was trying something like:

~r/(?&lt;=.{0,2})(.{3}).*?&lt;&lt;0&gt;&gt;(.{3})(?=.{0,2})/

答案1

得分: 3

In Elixir，不需要使用正则表达式来实现。使用递归会更好更快（而且更可读）。

defmodule NullByte do
  def get_3_around(""), do: ""
  def get_3_around(<<pre::binary-size(3), 0, post::binary-size(3), _::binary>>), do: pre <> <<0>> <> post
  def get_3_around(<<pre::binary-size(3), 0, post::binary>>), do: pre <> <<0>> <> post
  def get_3_around(<<_::binary-size(1), rest::binary>>), do: get_3_around(rest)
  def test, do: get_3_around("hello this is a " <> <<0>> <> "string")
end

英文:

In Elixir, one does not need regexp to do that. Recursion would work better and faster (and way more readable).

defmodule NullByte do
  def get_3_around(&quot;&quot;),
    do: &quot;&quot;
  def get_3_around(&lt;&lt;pre::binary-size(3), 0, post::binary-size(3), _::binary&gt;&gt;),
    do: pre &lt;&gt; &lt;&lt;0&gt;&gt; &lt;&gt; post
  def get_3_around(&lt;&lt;pre::binary-size(3), 0, post::binary&gt;&gt;),
    do: pre &lt;&gt; &lt;&lt;0&gt;&gt; &lt;&gt; post
  def get_3_around(&lt;&lt;_::binary-size(1), rest::binary&gt;&gt;),
    do: get_3_around(rest)
  def test, do: get_3_around(&quot;hello this is a &quot; &lt;&gt; &lt;&lt;0&gt;&gt; &lt;&gt; &quot;string&quot;)
end

答案2

得分: 0

Elixir中的字符串使用UTF-8编码，这意味着单个字符可能不止一个字节长，因此最好设计函数以处理UTF-8字符。三个字符的长度可能长达12个字节，所以不要假定每个字符都是1个字节长，你可以使用 var_name::utf8 来匹配单个UTF-8字符。不幸的是，在二进制中不能使用 utf8 类型指定大小，因此无法通过简单地编写 var_name::utf8-size(3) 来匹配多个UTF-8字符，而是必须显式地编写三个不同的 "段"（这是语言中的一个疏忽，应该进行修正），例如：

<<char1::utf8, char2::utf8, char3::utf8, ....>>

接下来，空字节是不可打印字符，elixir不会将空字节打印为 <<0>>。但是，你可以显式地打印字符串 "<<0>>"，例如：

IO.iex(7)> IO.puts "&lt;&lt;0&gt;&gt;"
<<0>>

但是，你应该注意 "<<0>>" 长度为5个字节，而不是1个字节。

在下面的示例中，二进制语法将查找双引号之间每个字符的UTF-8整数字符代码：

iex(17)> str = << "123"::utf8, 0::utf8, "456"::utf8 >>
<<49, 50, 51, 0, 52, 53, 54>>
iex(13)> IO.puts str
123^@456   <-- shell使用 "carrot 符号" 来显示不可打印字符
:ok
iex(14)> IO.inspect str
<<49, 50, 51, 0, 52, 53, 54>>
<<49, 50, 51, 0, 52, 53, 54>>

如果字符串/二进制包含不可打印字符，elixir不会以双引号格式输出字符串：

iex(2)> IO.inspect <<97,98>>
"ab"
"ab"
iex(3)> IO.inspect <<97, 0, 98>>
<<97, 0, 98>>
<<97, 0, 98>>

以下是如何在Elixir中匹配UTF-8字符：

defmodule My do
  # 从字符串开头查找匹配：
  def grab_3_chars_either_side_of_null(<<char1::utf8,
                                         char2::utf8,
                                         char3::utf8,
                                         0::utf8,   # 尝试匹配空字节
                                         char4::utf8,
                                         char5::utf8,
                                         char6::utf8,
                                         _rest::binary>>) do
     <<char1::utf8, char2::utf8, char3::utf8,
       <<0>>   # 你期望的输出，长度为5字节。
                # 如果只想要一个字节，将其更改为 0::utf8
       char4::utf8, char5::utf8, char6::utf8>>
  end
  # 如果在上面的字符串开头找不到匹配项，
  # 那么删除第一个UTF-8字符，_::utf8，并在其余字符串的开头查找匹配项（递归函数调用）：
  def grab_3_chars_either_side_of_null(<<_::utf8,
                                         rest::binary>>) do
    grab_3_chars_either_side_of_null(rest)
  end
end
# 如果所有UTF-8字符都从字符串前面删除了，
# 那么字符串为空，没有找到匹配项，因此返回原子 ":no_match"：
def grab_3_chars_either_side_of_null(<<>>), do: :no_match

我将留下定义 grab_3_chars_either_side_of_null/1 的其他分支的工作，根据需要进行定义。

注意：

char1、char2等实际上将被分配整数值，为了将整数转换回字符串中的UTF-8字符，你必须编写 <<char1::utf8>>。
rest::binary 就像正则表达式中的贪婪 .* 一样：它可以匹配0到无限多个字符，并且只能放在二进制的末尾。

如果所有这些都太复杂，你还可以使用 String.split/3 来在空字节上拆分，然后在每个片段上使用 String.split_at/2 来获取第一个片段的最后三个字符（-3），以及第二个片段的前三个字符（3）。

英文:

Strings in Elixir employ the UTF-8 encoding, which means a single character can be longer than one byte, so it's better to design the function to handle UTF-8 characters. Three characters could be up to 12 bytes long, so rather than assuming every character is 1 byte long, you can match a single UTF-8 character using var_name::utf8. Unfortunately, you are not able to specify a size with the utf8 type in a binary, so you can't match multiple UTF-8 characters by simply writing var_name::utf8-size(3), instead you have to explicitly write out three different "segments" (which is a complete pain in the ass, and it's an oversight in the language that should be corrected), for example:

&lt;&lt;char1::utf8, char2::utf8, char3::utf8, ....&gt;

Next, a null byte is a non-printing character, and elixir won't print a null byte as <<0>>. However, you can explicitly print the string "<<0>>", e.g.

IO.iex(7)&gt; IO.puts &quot;&lt;&lt;0&gt;&gt;&quot;
&lt;&lt;0&gt;&gt;

But, you should be aware that "<<0>>" is 5 bytes long--not 1 byte.

In the following example, the binary syntax will look up the UTF-8 integer character codes for each character between the double quotes:

iex(17)&gt; str = &lt;&lt;&quot;123&quot;::utf8, 0::utf8, &quot;456&quot;::utf8&gt;&gt;
&lt;&lt;49, 50, 51, 0, 52, 53, 54&gt;&gt;
iex(13)&gt; IO.puts str
123^@456   &lt;--shell uses &quot;carrot notation&quot; to display non printing chars
:ok
iex(14)&gt; IO.inspect str
&lt;&lt;49, 50, 51, 0, 52, 53, 54&gt;&gt;
&lt;&lt;49, 50, 51, 0, 52, 53, 54&gt;&gt;

If a string/binary contains non-printing characters, then elixir won't output strings in double quote format:

iex(2)&gt; IO.inspect &lt;&lt;97,98&gt;&gt;
&quot;ab&quot;
&quot;ab&quot;
iex(3)&gt; IO.inspect &lt;&lt;97, 0, 98&gt;&gt;
&lt;&lt;97, 0, 98&gt;&gt;
&lt;&lt;97, 0, 98&gt;&gt;

Here's how to match UTF-8 characters in Elixir:

defmodule My do
  #Look for match starting at beginning of string:
  def grab_3_chars_either_side_of_null(&lt;&lt;char1::utf8,
                                         char2::utf8,
                                         char3::utf8,
                                         0::utf8,   #Tries to match a null byte
                                         char4::utf8,
                                         char5::utf8,
                                         char6::utf8,
                                         _rest::binary&gt;&gt;
                                      ) do
     &lt;&lt;char1::utf8, char2::utf8, char3::utf8,
       &quot;&lt;&lt;0&gt;&gt;&quot;,   # Your desired output, which is 5 bytes long.
                  # Change to 0::utf8 if you only want one byte
       char4::utf8, char5::utf8, char6::utf8&gt;&gt;
  end
  #If a match isn&#39;t found at the beginning of the string above,
  #then drop the first UTF-8 character, `_::utf8`, and look for a match at
  #start of the rest of the string (the recursive function call):
  def grab_3_chars_either_side_of_null(&lt;&lt;_::utf8,
                                         rest::binary&gt;&gt;
                                   ) do
    grab_3_chars_either_side_of_null(rest)
  end
end
#If all the UTF-8 characters have been dropped off the front of the string,
#then the string is empty, and no matches were found, so return the atom
#`:no_match`:
def grab_3_chars_either_side_of_null(&lt;&lt;&gt;&gt;), do: :no_match

I'll leave it as an exercise to define other branches of grab_3_chars_either_side_of_null/1 as you see fit.

Note:

char1, char2, etc. will actually be assigned integers, and in order to convert an integer back to a UTF-8 character in a string, you have to write <<char1::utf8>>.
rest::binary is like a greedy .* in a regex: it will match 0 to an infinite number of characters, and it can only be placed at the end of a binary.

If all that is too confusing, you could also use String.split/3 to split on the null byte, then use String.split_at/2 on each piece to get the last three characters (-3) of the first piece and the first three characters (3) of the second piece.

答案3

得分: 0

以下是您要翻译的代码部分：

&lt;!-- language-all: lang-elixir --&gt;
This is a simplification of Aleksei&#39;s answer, modified to return a 2-tuple.
defmodule NullByte do
  def get_3_around(&lt;&lt;&gt;&gt;), do: nil
  def get_3_around(&lt;&lt;pre::binary-3, 0, post::binary-3, _::binary&gt;&gt;), do: {pre, post}
  def get_3_around(&lt;&lt;_::binary-1, rest::binary&gt;&gt;), do: get_3_around(rest)
end
Usage:
```lang-none
iex(1)&gt; NullByte.get_3_around(&quot;aaa&quot; &lt;&gt; &lt;&lt;0&gt;&gt; &lt;&gt; &quot;bbb&quot;)
{&quot;aaa&quot;, &quot;bbb&quot;}
iex(2)&gt; NullByte.get_3_around(&quot;aaabbb&quot; &lt;&gt; &lt;&lt;0&gt;&gt; &lt;&gt; &quot;cccddd&quot;)
{&quot;bbb&quot;, &quot;ccc&quot;}
iex(3)&gt; NullByte.get_3_around(&quot;aaa&quot; &lt;&gt; &lt;&lt;0&gt;&gt; &lt;&gt; &quot;b&quot;)
nil
iex(4)&gt; NullByte.get_3_around(&quot;foo&quot;)
nil

英文:

This is a simplification of Aleksei's answer, modified to return a 2-tuple.

defmodule NullByte do
  def get_3_around(&lt;&lt;&gt;&gt;), do: nil
  def get_3_around(&lt;&lt;pre::binary-3, 0, post::binary-3, _::binary&gt;&gt;), do: {pre, post}
  def get_3_around(&lt;&lt;_::binary-1, rest::binary&gt;&gt;), do: get_3_around(rest)
end

Usage:

iex(1)&gt; NullByte.get_3_around(&quot;aaa&quot; &lt;&gt; &lt;&lt;0&gt;&gt; &lt;&gt; &quot;bbb&quot;)
{&quot;aaa&quot;, &quot;bbb&quot;}
iex(2)&gt; NullByte.get_3_around(&quot;aaabbb&quot; &lt;&gt; &lt;&lt;0&gt;&gt; &lt;&gt; &quot;cccddd&quot;)
{&quot;bbb&quot;, &quot;ccc&quot;}
iex(3)&gt; NullByte.get_3_around(&quot;aaa&quot; &lt;&gt; &lt;&lt;0&gt;&gt; &lt;&gt; &quot;b&quot;)
nil
iex(4)&gt; NullByte.get_3_around(&quot;foo&quot;)
nil

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Elixir中，如何返回空字节两侧的3个字符？

问题

答案1

答案2

答案3

返回泛型类型的默认值

Scala模式匹配中的情况类实例，具有任意数量的None字段。

不一致的错误：“所需类型：字节，实际提供类型：整数”（Java中）

Integer to Byte conversion in Java vs Go

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。