如何正确移除被禁用的词汇?

huangapple go评论53阅读模式
英文:

how to properly remove banned words?

问题

原始字符串中包含了需要移除的以@符号开头的单词。您可以按照以下步骤来实现:

  1. 将字符串分割成单词
  2. 使用列表过滤器来筛选掉不必要的单词

但您提到不太明白如何分割行,因为除了空格之外,还包括\t\n等特殊字符。您担心失去这些字符并无法还原原始文本。

这是我要得到的示例结果:

原始字符串:

haha lala\n@delete_me all-ok

期望的结果:

haha lala\nall-ok
英文:

I have a line from which I want to remove all words beginning with the symbol @, I do not fully understand how to do it expressively. It is clear that you could write something like this:

  1. Split the string into words

  2. Use the list filter to weed out unnecessary words

But I guess I don't understand how to break lines, because in addition to the space, there are such characters as \t and \n, besides, I will lose them and can not restore the original text.

An example of what I want to get:

original string:

haha lala\n@delete_me all-ok

expected result:

haha lala\nall-ok

答案1

得分: 1

你可以使用 Data.List.Split.splitData.List.Split.oneOf

它返回包括分隔符的拆分单词,因此你可以使用它们重新构建文本。

split (oneOf "xyz") "aazbxyzcxd" == ["aa","z","b","x","","y","","z","c","x","d"]
英文:

You might want to use Data.List.Split.split with Data.List.Split.oneOf.

It returns split words including separators, so you can rebuild text with them.

split (oneOf "xyz") "aazbxyzcxd" == ["aa","z","b","x","","y","","z","c","x","d"]

答案2

得分: 1

以下是您要翻译的代码部分:

另一种看待这个问题的方式是,我们想要删除以at符号`@`开头的非空格字符串,以及后面的任何空格。我们不想对换行或其他字符做特殊处理。可以使用简单的递归函数来表达这一点,使用`span` / `break``dropWhile`

```haskell
censor :: String -> String

censor "" = ""
censor text0 = spaces ++ nonspaces ++ censor rest
  where
    (spaces, text1) = span isSpace text0
    (word, text2) = break isSpace text1
    (nonspaces, rest)
      | banned word = ("", trim text2)
      | otherwise = (word, text2)

banned :: String -> Bool
banned ('@' : _) = True
banned _ = False

trim :: String -> String
trim = dropWhile isSpace

考虑一个示例:

  1. censor " send @beans money to sam@example.com"
  2. span 返回 " ""send @beans…"
  3. break 返回 "send"" @beans…"
  4. 对于"send"banned 返回false,所以我们会保留它
  5. 我们递归调用 censor " @beans money…"
  6. span 返回 " ""@beans money…"
  7. break 返回 "@beans"" money…"
  8. 现在 banned 对于"@beans" 返回true,所以我们将其删除并修整其余部分
  9. 我们递归调用 censor "money…"
  10. 我们保留所有剩余的子字符串,包括sam@example.com,因为它不是banned
  11. 最后,我们达到字符串的末尾,censor "" 返回 ""

最终结果是这个表达式:

"  " ++ "send" ++ " " ++ "" ++ "money" ++ " " ++ "to" ++ " " ++ "sam@example.com" ++ ""

请注意,我们使用一系列对输入字符串的更新,导致了一系列中间状态的变量,如text0text1text2rest。考虑如何使用State来表示这个模式。

英文:

Another way to look at the problem is that we want to delete strings of non-spaces that begin with an at sign @, as well as any following spaces. We don’t want to treat line breaks or other characters specially at all. That can be expressed with a simple recursive function using span / break and dropWhile:

censor :: String -> String

censor "" = ""

censor text0 = spaces ++ nonspaces ++ censor rest
  where

    (spaces, text1) = span isSpace text0

    (word, text2) = break isSpace text1

    (nonspaces, rest)

      | banned word
      = ("", trim text2)

      | otherwise
      = (word, text2)

banned :: String -> Bool
banned ('@' : _) = True
banned _ = False

trim :: String -> String
trim = dropWhile isSpace  

Consider an example:

  1. censor " send @beans money to sam@example.com"
  2. span returns " " and "send @beans…"
  3. break returns "send" and " @beans…"
  4. banned returns false for "send", so we will keep it
  5. We recursively call censor " @beans money…"
  6. span returns " " and "@beans money…"
  7. break returns "@beans" and " money…"
  8. Now banned returns true for "@beans", so we drop it and trim the rest
  9. We recursively call censor "money…"
  10. We keep all the remaining substrings, including sam@example.com, since it is not banned
  11. Finally, we reach the end of the string and censor "" returns ""

The end result is this expression:

"  " ++ "send" ++ " " ++ "" ++ "money" ++ " " ++ "to" ++ " " ++ "sam@example.com" ++ ""

Notice that we use a series of updates to the input string resulting in a series of variables text0, text1, text2, rest for the intermediate states. Consider how you could express this pattern using State instead.

huangapple
  • 本文由 发表于 2023年2月10日 06:11:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404981.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定