我有一列越南字符,当其作为 .csv 文件导入到 R 时无法正确读取。

huangapple go评论76阅读模式
英文:

I have a column of Vietnamese characters that can't be properly read by R when imported as a .csv

问题

在将初始的Excel文件转换为.csv并导入R后,在Excel中看起来正常的字符在R中变得混乱。

当我检查列的编码时,我得到了ASCII、WINDOWS-1252和MAC-CENTRALEUROPE的混合。我希望它要么在R中以越南字符的形式呈现,要么全部转换为拉丁字符。

我尝试使用stringi包将列转换为一种编码,如UTF-8,以便我可以使用vietnameseConverter包将列转换为越南字符,或者使用Encoding()函数将列转换为拉丁字符。然而,该列仍然保持在多种不同的编码中。

英文:

After converting initial excel file into a .csv and importing this into R, the characters that look fine in excel become garbled in R.

我有一列越南字符,当其作为 .csv 文件导入到 R 时无法正确读取。

When I check the encoding of the column, I get a mix of ASCII, WINDOWS-1252, and MAC-CENTRALEUROPE. I'd like it to be either presented with Vietnamese characters in R, or all converted to Latin characters.

I tried using the stringi package to convert the column into one encoding like UTF-8, so that I could use the vietnameseConverter package to convert the column into Vietnamese characters or the Encoding() function to turn the column into Latin characters. However, the column remains in multiple different encodings.

答案1

得分: 1

在导出为CSV文件时,Excel可能不会始终保留字符的原始编码,导致在导入到R时出现乱码文本。

  1. 将文件保存为带有UTF-8编码的CSV。
  2. 在R中,使用read.csv()函数读取CSV文件,并将fileEncoding参数设置为"UTF-8"。

如果仍然遇到编码问题,可以尝试使用R中的iconv()函数将列的字符编码转换为UTF-8。

英文:

When exporting to a CSV file, Excel may not always preserve the original encoding of the characters, leading to garbled text when importing into R.

  1. Save the file as a CSV with UTF-8 encoding
  2. In R, read in the CSV file using the read.csv() function and set the fileEncoding argument to "UTF-8".

If you still encounter issues with the encoding, you can try using the iconv() function in R to convert the character encoding of the column to UTF-8

huangapple
  • 本文由 发表于 2023年3月9日 23:38:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75686844.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定