2023年2月10日 06:11:49go评论61阅读模式

英文:

how to properly remove banned words?

问题

原始字符串中包含了需要移除的以@符号开头的单词。您可以按照以下步骤来实现：

将字符串分割成单词
使用列表过滤器来筛选掉不必要的单词

但您提到不太明白如何分割行，因为除了空格之外，还包括\t和\n等特殊字符。您担心失去这些字符并无法还原原始文本。

这是我要得到的示例结果：

原始字符串：

haha lala\n@delete_me all-ok

期望的结果：

haha lala\nall-ok

英文:

I have a line from which I want to remove all words beginning with the symbol @, I do not fully understand how to do it expressively. It is clear that you could write something like this:

Split the string into words
Use the list filter to weed out unnecessary words

But I guess I don't understand how to break lines, because in addition to the space, there are such characters as \t and \n, besides, I will lose them and can not restore the original text.

An example of what I want to get:

original string:

haha lala\n@delete_me all-ok

expected result:

haha lala\nall-ok

答案1

得分: 1

你可以使用 Data.List.Split.split 和 Data.List.Split.oneOf。

它返回包括分隔符的拆分单词，因此你可以使用它们重新构建文本。

split (oneOf "xyz") "aazbxyzcxd" == ["aa","z","b","x","","y","","z","c","x","d"]

英文:

You might want to use Data.List.Split.split with Data.List.Split.oneOf.

It returns split words including separators, so you can rebuild text with them.

split (oneOf &quot;xyz&quot;) &quot;aazbxyzcxd&quot; == [&quot;aa&quot;,&quot;z&quot;,&quot;b&quot;,&quot;x&quot;,&quot;&quot;,&quot;y&quot;,&quot;&quot;,&quot;z&quot;,&quot;c&quot;,&quot;x&quot;,&quot;d&quot;]

答案2

得分: 1

以下是您要翻译的代码部分：

另一种看待这个问题的方式是，我们想要删除以at符号`@`开头的非空格字符串，以及后面的任何空格。我们不想对换行或其他字符做特殊处理。可以使用简单的递归函数来表达这一点，使用`span` / `break`和`dropWhile`：

```haskell
censor :: String -> String

censor "" = ""
censor text0 = spaces ++ nonspaces ++ censor rest
  where
    (spaces, text1) = span isSpace text0
    (word, text2) = break isSpace text1
    (nonspaces, rest)
      | banned word = ("", trim text2)
      | otherwise = (word, text2)

banned :: String -> Bool
banned ('@' : _) = True
banned _ = False

trim :: String -> String
trim = dropWhile isSpace

考虑一个示例：

censor " send @beans money to sam@example.com"
span 返回 " " 和 "send @beans…"
break 返回 "send" 和 " @beans…"
对于"send"，banned 返回false，所以我们会保留它
我们递归调用 censor " @beans money…"
span 返回 " " 和 "@beans money…"
break 返回 "@beans" 和 " money…"
现在 banned 对于"@beans" 返回true，所以我们将其删除并修整其余部分
我们递归调用 censor "money…"
我们保留所有剩余的子字符串，包括sam@example.com，因为它不是banned
最后，我们达到字符串的末尾，censor "" 返回 ""

最终结果是这个表达式：

"  " ++ "send" ++ " " ++ "" ++ "money" ++ " " ++ "to" ++ " " ++ "sam@example.com" ++ ""

请注意，我们使用一系列对输入字符串的更新，导致了一系列中间状态的变量，如text0、text1、text2、rest。考虑如何使用State来表示这个模式。

英文:

Another way to look at the problem is that we want to delete strings of non-spaces that begin with an at sign @, as well as any following spaces. We don’t want to treat line breaks or other characters specially at all. That can be expressed with a simple recursive function using span / break and dropWhile:

censor :: String -&gt; String

censor &quot;&quot; = &quot;&quot;

censor text0 = spaces ++ nonspaces ++ censor rest
  where

    (spaces, text1) = span isSpace text0

    (word, text2) = break isSpace text1

    (nonspaces, rest)

      | banned word
      = (&quot;&quot;, trim text2)

      | otherwise
      = (word, text2)

banned :: String -&gt; Bool
banned (&#39;@&#39; : _) = True
banned _ = False

trim :: String -&gt; String
trim = dropWhile isSpace

Consider an example:

censor " send @beans money to sam@example.com"
span returns " " and "send @beans…"
break returns "send" and " @beans…"
banned returns false for "send", so we will keep it
We recursively call censor " @beans money…"
span returns " " and "@beans money…"
break returns "@beans" and " money…"
Now banned returns true for "@beans", so we drop it and trim the rest
We recursively call censor "money…"
We keep all the remaining substrings, including sam@example.com, since it is not banned
Finally, we reach the end of the string and censor "" returns ""

The end result is this expression:

&quot;  &quot; ++ &quot;send&quot; ++ &quot; &quot; ++ &quot;&quot; ++ &quot;money&quot; ++ &quot; &quot; ++ &quot;to&quot; ++ &quot; &quot; ++ &quot;sam@example.com&quot; ++ &quot;&quot;

Notice that we use a series of updates to the input string resulting in a series of variables text0, text1, text2, rest for the intermediate states. Consider how you could express this pattern using State instead.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何正确移除被禁用的词汇？

问题

答案1

答案2

如何高效地枚举二进制黑白树，并考虑对称性？

合并类型以用于服务器端点

你需要导入什么来执行位运算？

Haskell Massiv 数组大小限制

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论