英文:
how to properly remove banned words?
问题
原始字符串中包含了需要移除的以@
符号开头的单词。您可以按照以下步骤来实现:
- 将字符串分割成单词
- 使用列表过滤器来筛选掉不必要的单词
但您提到不太明白如何分割行,因为除了空格之外,还包括\t
和\n
等特殊字符。您担心失去这些字符并无法还原原始文本。
这是我要得到的示例结果:
原始字符串:
haha lala\n@delete_me all-ok
期望的结果:
haha lala\nall-ok
英文:
I have a line from which I want to remove all words beginning with the symbol @
, I do not fully understand how to do it expressively. It is clear that you could write something like this:
-
Split the string into words
-
Use the list filter to weed out unnecessary words
But I guess I don't understand how to break lines, because in addition to the space, there are such characters as \t
and \n
, besides, I will lose them and can not restore the original text.
An example of what I want to get:
original string:
haha lala\n@delete_me all-ok
expected result:
haha lala\nall-ok
答案1
得分: 1
你可以使用 Data.List.Split.split
和 Data.List.Split.oneOf
。
它返回包括分隔符的拆分单词,因此你可以使用它们重新构建文本。
split (oneOf "xyz") "aazbxyzcxd" == ["aa","z","b","x","","y","","z","c","x","d"]
英文:
You might want to use Data.List.Split.split
with Data.List.Split.oneOf
.
It returns split words including separators, so you can rebuild text with them.
split (oneOf "xyz") "aazbxyzcxd" == ["aa","z","b","x","","y","","z","c","x","d"]
答案2
得分: 1
以下是您要翻译的代码部分:
另一种看待这个问题的方式是,我们想要删除以at符号`@`开头的非空格字符串,以及后面的任何空格。我们不想对换行或其他字符做特殊处理。可以使用简单的递归函数来表达这一点,使用`span` / `break`和`dropWhile`:
```haskell
censor :: String -> String
censor "" = ""
censor text0 = spaces ++ nonspaces ++ censor rest
where
(spaces, text1) = span isSpace text0
(word, text2) = break isSpace text1
(nonspaces, rest)
| banned word = ("", trim text2)
| otherwise = (word, text2)
banned :: String -> Bool
banned ('@' : _) = True
banned _ = False
trim :: String -> String
trim = dropWhile isSpace
考虑一个示例:
censor " send @beans money to sam@example.com"
span
返回" "
和"send @beans…"
break
返回"send"
和" @beans…"
- 对于
"send"
,banned
返回false,所以我们会保留它 - 我们递归调用
censor " @beans money…"
span
返回" "
和"@beans money…"
break
返回"@beans"
和" money…"
- 现在
banned
对于"@beans"
返回true,所以我们将其删除并修整其余部分 - 我们递归调用
censor "money…"
- 我们保留所有剩余的子字符串,包括
sam@example.com
,因为它不是banned
- 最后,我们达到字符串的末尾,
censor ""
返回""
最终结果是这个表达式:
" " ++ "send" ++ " " ++ "" ++ "money" ++ " " ++ "to" ++ " " ++ "sam@example.com" ++ ""
请注意,我们使用一系列对输入字符串的更新,导致了一系列中间状态的变量,如text0
、text1
、text2
、rest
。考虑如何使用State
来表示这个模式。
英文:
Another way to look at the problem is that we want to delete strings of non-spaces that begin with an at sign @
, as well as any following spaces. We don’t want to treat line breaks or other characters specially at all. That can be expressed with a simple recursive function using span
/ break
and dropWhile
:
censor :: String -> String
censor "" = ""
censor text0 = spaces ++ nonspaces ++ censor rest
where
(spaces, text1) = span isSpace text0
(word, text2) = break isSpace text1
(nonspaces, rest)
| banned word
= ("", trim text2)
| otherwise
= (word, text2)
banned :: String -> Bool
banned ('@' : _) = True
banned _ = False
trim :: String -> String
trim = dropWhile isSpace
Consider an example:
censor " send @beans money to sam@example.com"
span
returns" "
and"send @beans…"
break
returns"send"
and" @beans…"
banned
returns false for"send"
, so we will keep it- We recursively call
censor " @beans money…"
span
returns" "
and"@beans money…"
break
returns"@beans"
and" money…"
- Now
banned
returns true for"@beans"
, so we drop it and trim the rest - We recursively call
censor "money…"
- We keep all the remaining substrings, including
sam@example.com
, since it is notbanned
- Finally, we reach the end of the string and
censor ""
returns""
The end result is this expression:
" " ++ "send" ++ " " ++ "" ++ "money" ++ " " ++ "to" ++ " " ++ "sam@example.com" ++ ""
Notice that we use a series of updates to the input string resulting in a series of variables text0
, text1
, text2
, rest
for the intermediate states. Consider how you could express this pattern using State
instead.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论