从字符串中删除ASCII控制字符

huangapple go评论85阅读模式
英文:

Remove ASCII control characters from string

问题

我有一个包含字符串值的数据框,其中的一列包含字符串值。其中一些字符串是字符和日期的组合,一些是字符和数字的组合。有些情况下,字符串中会包含标点符号,比如“()”或“#”。字符串中有这些类型的字符是完全可以的。最终,这个数据框会被写入一个Excel文件。

我遇到的问题是,“STX” ASCII控制字符嵌入到其中一个字符串中,我似乎无法将其删除,这在数据写入后打开Excel文件时会引发问题。以下是这个字符串的示例:

'Value 1/2: This, That, Third, Random STX Value, Random 2, Value 6'

我尝试了以下方法,但都没有成功:

str_replace_all(df$col, "[[:punct:]]", "")
iconv(df$col, "ASCII", 'UTF-8', sub = "")

有谁知道我如何将其删除?

英文:

I have a df that contains a column with string values in it. Some of these strings are a combinations of characters and dates, some characters and numbers. There are some instances where the string will have punctuation, such as a "()" or "#". Having these types of characters in the string is perfectly fine. This df is ultimately written to an excel file.

The issue I've run into is that the "STX" ASCII control character is embedded into one of the strings and I can't seem to get it removed, which causes issues when opening the excel file after the data has been written to it. Here's an example of what that string may look like:

'Value 1/2: This, That, Third, Random STX Value, Random 2, Value 6'

I've tried doing the following but no luck on either:

str_replace_all(df$col, "[[:punct:]]", "")
iconv(df$col, "ASCII", 'UTF-8', sub = "")

Does anyone know how I can get this removed?

答案1

得分: 1

你说你想从你的字符串中删除所有 STX 字符 的出现。

你可以使用一个简单的 gsub 命令 来实现(它只是搜索模式或固定字符串(取决于 fixed 参数的值)并用替换模式或另一个固定字符串替换它):

df$col = gsub("\x02", "", df$col, fixed=TRUE)

\x02 是什么?它是一个 字符串转义序列,其中 \x 表示构造的开始,接下来的两个字符被解释为十六进制数字。

fixed=TRUE 参数告诉 R 引擎将 STX 字符视为字面字符,而不是正则表达式模式,这通常会提高性能并避免其他与正则表达式相关的问题,当你只需要用另一个字面文本替换字面文本时。

英文:

You say you want to remove all occurrences of STX character from your strings.

You can do it with a simple gsub command (all it does is searching for the pattern or a fixed string (depends on the fixed argument value) and replaces with a replacement pattern or another fixed string:

df$col = gsub("\x02", "", df$col, fixed=TRUE)

What is \x02? It is a string escape sequence where \x signals the the construct start and the next two chars are interpreted as a hexadecimal number.

The fixed=TRUE argument tells the R engine to search for the STX character as a literal char, not as a regex pattern, which usually results in better performance and avoids other regex-related issues when all you need is to replace a literal text with another literal text.

huangapple
  • 本文由 发表于 2023年7月27日 21:32:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76780291.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定