C# – Visual Studio Code – 编码问题

huangapple go评论96阅读模式
英文:

C# - Visual studio code - encoding problem

问题

以下是代码部分的翻译:

  1. foreach (string file in Directory.EnumerateFiles("C:\xml_folder\" + sub_folder, "*.xml")) {
  2. Console.WriteLine(file);
  3. string response = File.ReadAllText(file, Encoding.GetEncoding("Windows-1252"));
  4. Console.WriteLine(response);
  5. var document = XDocument.Parse(response);
  6. foreach (var child in document.Root.Elements("result")) {
  7. //... 在这里添加代码
  8. String name_it = "";
  9. String name_en = "";
  10. String name_es = "";
  11. String name_fr = "";
  12. String name_de = "";
  13. String name_ru = "";
  14. foreach (var translationsChild in child.Elements("translations"))
  15. {
  16. switch (translationsChild.Element("language").Value)
  17. {
  18. case "it":
  19. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  20. name_it = Encoding.UTF8.GetString(bytes);
  21. break;
  22. case "en-gb":
  23. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  24. name_en = Encoding.UTF8.GetString(bytes);
  25. break;
  26. case "es":
  27. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  28. name_es = Encoding.UTF8.GetString(bytes);
  29. break;
  30. case "fr":
  31. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  32. name_fr = Encoding.UTF8.GetString(bytes);
  33. break;
  34. case "de":
  35. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  36. name_de = Encoding.UTF8.GetString(bytes);
  37. break;
  38. case "ru":
  39. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  40. name_ru = Encoding.UTF8.GetString(bytes);
  41. Console.WriteLine(name_ru);
  42. break;
  43. }
  44. }

这是您要求的代码部分的翻译,没有其他内容。如果需要进一步帮助,请随时提出具体问题。

英文:

I have a folder with inside an XML file like this:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <cities>
  3. <result>
  4. <city_id>-3870534</city_id>
  5. <country>mx</country>
  6. <name>Santa Bárbara</name>
  7. <nr_hotels>0</nr_hotels>
  8. <translations>
  9. <language>en-gb</language>
  10. <name>Santa Bárbara</name>
  11. </translations>
  12. <translations>
  13. <language>ru</language>
  14. <name>Санта-Барбара</name>
  15. </translations>
  16. </result>
  17. </cities>
  18. <!-- RUID: [UmFuZG9tSVYkc2RlIyh9YcxtmfhRwqry58sgWYNIgEV1AjdsVswrKUorBoUlR6ylFgiaj5XJ0w0DP0lL/htWqOKtE33w1EhBbLABKokIfEo=] -->

The file looks well formatted, in utf8, it contains Russian terms and symbols like "á" in Santa Bárbara.
I should read this file and create a record in a MySql DB (through C#), but I'm facing encoding problems.

PS: the DB table has a few columns (to store city id, country and city translations), all text fields, utf8_general_ci.

I'm trying the following code to read the files (just one in this case) in a folder

  1. foreach (string file in Directory.EnumerateFiles("C:\xml_folder\"" + sub_folder, "*.xml")) {
  2. Console.WriteLine(file);
  3. string response = File.ReadAllText(file, Encoding.GetEncoding("Windows-1252"));
  4. Console.WriteLine(response);
  5. var document = XDocument.Parse(response);
  6. foreach (var child in document.Root.Elements("result")) {
  7. //... code here
  8. String name_it = "";
  9. String name_en = "";
  10. String name_es = "";
  11. String name_fr = "";
  12. String name_de = "";
  13. String name_ru = "";
  14. foreach (var translationsChild in child.Elements("translations"))
  15. {
  16. switch (translationsChild.Element("language").Value)
  17. {
  18. case "it":
  19. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  20. name_it = Encoding.UTF8.GetString(bytes);
  21. break;
  22. case "en-gb":
  23. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  24. name_en = Encoding.UTF8.GetString(bytes);
  25. break;
  26. case "es":
  27. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  28. name_es = Encoding.UTF8.GetString(bytes);
  29. break;
  30. case "fr":
  31. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  32. name_fr = Encoding.UTF8.GetString(bytes);
  33. break;
  34. case "de":
  35. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  36. name_de = Encoding.UTF8.GetString(bytes);
  37. break;
  38. case "ru":
  39. bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
  40. name_ru = Encoding.UTF8.GetString(bytes);
  41. Console.WriteLine(name_ru);
  42. break;
  43. }
  44. }

In a few words, I get the file, than I convert it in XML to read all children and save it into the DB.

The problem seems related to the way (encoding) I'm getting the string from the file, I tried conversion in Windows-1252.

  1. string response = File.ReadAllText(file, Encoding.GetEncoding("Windows-1252"));

I even tried conversion in utf8

  1. string response = File.ReadAllText(file, System.Text.Encoding.UTF8);

but every time I get (in the debug console and in the DB), this:

  1. Santa Bárbara -\> Santa B?rbara
  2. Санта-Барбара -\> ?????-??????

It looks like a problem related to the way File.ReadAllText(...) works, encoding is not working at all.

PS: to store data into the DB I use a DML like this:

  1. cmd.CommandText = "INSERT INTO cities (city_id,country,name,nr_hotels,name_it,name_en,name_es,name_fr,name_de,name_ru,last_modified_date) VALUES(@city_id,@country,@name,@nr_hotels,@name_it,@name_en,@name_es,@name_fr,@name_de,@name_ru,@last_modified_date) on duplicate key update city_id=@city_id,country=@country,name=@name,nr_hotels=@nr_hotels,name_it=@name_it,name_en=@name_en,name_es=@name_es,name_fr=@name_fr,name_de=@name_de,name_ru=@name_ru,last_modified_date=@last_modified_date";

Please, can you help me?
thanks in advance

答案1

得分: 0

Santa Bárbara
Санта-Барбара

英文:

I don't see any sense in converting to a byte array and back. This works properly for me

  1. string response = File.ReadAllText(file, Encoding.UTF8);
  2. var document = XDocument.Parse(response);
  3. foreach (var child in document.Root.Elements("result"))
  4. {
  5. //... code here
  6. String name_en = "";
  7. String name_ru = "";
  8. foreach (var translationsChild in child.Elements("translations"))
  9. {
  10. var name = translationsChild.Element("name").Value;
  11. Console.WriteLine(name);
  12. switch (translationsChild.Element("language").Value)
  13. {
  14. case "en-gb":
  15. name_en = name;
  16. break;
  17. case "ru":
  18. name_ru = name;
  19. break;
  20. }
  21. }
  22. }

output

  1. Santa Bárbara
  2. Санта-Барбара

huangapple
  • 本文由 发表于 2023年4月11日 02:14:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979603.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定