C# – Visual Studio Code – 编码问题

huangapple go评论53阅读模式
英文:

C# - Visual studio code - encoding problem

问题

以下是代码部分的翻译:

foreach (string file in Directory.EnumerateFiles("C:\xml_folder\" + sub_folder, "*.xml")) {
    Console.WriteLine(file);

    string response = File.ReadAllText(file, Encoding.GetEncoding("Windows-1252"));

    Console.WriteLine(response);

    var document = XDocument.Parse(response);

    foreach (var child in document.Root.Elements("result")) {
         //... 在这里添加代码
 
        String name_it = "";
        String name_en = "";
        String name_es = "";
        String name_fr = "";
        String name_de = "";
        String name_ru = "";

        foreach (var translationsChild in child.Elements("translations"))
        {
            switch (translationsChild.Element("language").Value)
            {
                case "it":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_it = Encoding.UTF8.GetString(bytes);
                    break;
                case "en-gb":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_en = Encoding.UTF8.GetString(bytes);
                    break;
                case "es":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_es = Encoding.UTF8.GetString(bytes);
                    break;
                case "fr":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_fr = Encoding.UTF8.GetString(bytes);
                    break;
                case "de":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_de = Encoding.UTF8.GetString(bytes);
                    break;
                case "ru":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_ru = Encoding.UTF8.GetString(bytes);
                    Console.WriteLine(name_ru);
                    break;
            }
        }

这是您要求的代码部分的翻译,没有其他内容。如果需要进一步帮助,请随时提出具体问题。

英文:

I have a folder with inside an XML file like this:

<?xml version="1.0" encoding="UTF-8"?>
<cities>
   <result>
      <city_id>-3870534</city_id>
      <country>mx</country>
      <name>Santa Bárbara</name>
      <nr_hotels>0</nr_hotels>
      <translations>
         <language>en-gb</language>
         <name>Santa Bárbara</name>
      </translations>
      <translations>
         <language>ru</language>
         <name>Санта-Барбара</name>
      </translations>
   </result>
</cities>
<!-- RUID: [UmFuZG9tSVYkc2RlIyh9YcxtmfhRwqry58sgWYNIgEV1AjdsVswrKUorBoUlR6ylFgiaj5XJ0w0DP0lL/htWqOKtE33w1EhBbLABKokIfEo=] -->

The file looks well formatted, in utf8, it contains Russian terms and symbols like "á" in Santa Bárbara.
I should read this file and create a record in a MySql DB (through C#), but I'm facing encoding problems.

PS: the DB table has a few columns (to store city id, country and city translations), all text fields, utf8_general_ci.

I'm trying the following code to read the files (just one in this case) in a folder

foreach (string file in Directory.EnumerateFiles("C:\xml_folder\"" + sub_folder, "*.xml")) {
    Console.WriteLine(file);

    string response = File.ReadAllText(file, Encoding.GetEncoding("Windows-1252"));

    Console.WriteLine(response);

    var document = XDocument.Parse(response);

    foreach (var child in document.Root.Elements("result")) {
         //... code here
 
        String name_it = "";
        String name_en = "";
        String name_es = "";
        String name_fr = "";
        String name_de = "";
        String name_ru = "";

        foreach (var translationsChild in child.Elements("translations"))
        {
            switch (translationsChild.Element("language").Value)
            {
                case "it":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_it = Encoding.UTF8.GetString(bytes);
                    break;
                case "en-gb":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_en = Encoding.UTF8.GetString(bytes);
                    break;
                case "es":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_es = Encoding.UTF8.GetString(bytes);
                    break;
                case "fr":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_fr = Encoding.UTF8.GetString(bytes);
                    break;
                case "de":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_de = Encoding.UTF8.GetString(bytes);
                    break;
                case "ru":
                    bytes = Encoding.Default.GetBytes(translationsChild.Element("name").Value);
                    name_ru = Encoding.UTF8.GetString(bytes);
                    Console.WriteLine(name_ru);
                    break;
            }
        }

In a few words, I get the file, than I convert it in XML to read all children and save it into the DB.

The problem seems related to the way (encoding) I'm getting the string from the file, I tried conversion in Windows-1252.

string response = File.ReadAllText(file, Encoding.GetEncoding("Windows-1252"));

I even tried conversion in utf8

string response = File.ReadAllText(file, System.Text.Encoding.UTF8);

but every time I get (in the debug console and in the DB), this:

Santa Bárbara -\> Santa B?rbara
Санта-Барбара -\> ?????-??????

It looks like a problem related to the way File.ReadAllText(...) works, encoding is not working at all.

PS: to store data into the DB I use a DML like this:

cmd.CommandText = "INSERT INTO cities (city_id,country,name,nr_hotels,name_it,name_en,name_es,name_fr,name_de,name_ru,last_modified_date) VALUES(@city_id,@country,@name,@nr_hotels,@name_it,@name_en,@name_es,@name_fr,@name_de,@name_ru,@last_modified_date) on duplicate key update city_id=@city_id,country=@country,name=@name,nr_hotels=@nr_hotels,name_it=@name_it,name_en=@name_en,name_es=@name_es,name_fr=@name_fr,name_de=@name_de,name_ru=@name_ru,last_modified_date=@last_modified_date";

Please, can you help me?
thanks in advance

答案1

得分: 0

Santa Bárbara
Санта-Барбара

英文:

I don't see any sense in converting to a byte array and back. This works properly for me

	string response = File.ReadAllText(file, Encoding.UTF8);
	var document = XDocument.Parse(response);

	foreach (var child in document.Root.Elements("result"))
	{
		//... code here

		String name_en = "";
		String name_ru = "";


		foreach (var translationsChild in child.Elements("translations"))
		{
			var name = translationsChild.Element("name").Value;
			Console.WriteLine(name);
			switch (translationsChild.Element("language").Value)
			{
				case "en-gb":
					name_en = name;
					break;

				case "ru":
					name_ru = name;
					break;
			}
		}
	}

output

Santa Bárbara
Санта-Барбара

huangapple
  • 本文由 发表于 2023年4月11日 02:14:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979603.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定