I am trying to read a csv file and print it. I got what I wanted to do, but I'm not sure how I can print Korean characters

huangapple go评论74阅读模式
英文:

I am trying to read a csv file and print it. I got what I wanted to do, but I'm not sure how I can print Korean characters

问题

程序中的问题是字符数组 name 的大小不足以容纳包含韩文字符的名称。由于韩文字母占据两个字节,因此需要相应地调整结构体 Monstername 的大小。

这是修复后的代码片段:

#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    wchar_t name[1000]; // 使用 wchar_t 来存储韩文字符
    int hp;
    int damage;
} Monster;

typedef struct {
    wchar_t header1[sizeof L"name"];
    wchar_t header2[sizeof L"hp"];
    wchar_t header3[sizeof L"damage"];
} Header;
int main()
{
    FILE* fp = fopen("entityData.csv", "r");
    if (!fp) {
        printf("Error opening file\n");
        return 1;
    }

    Monster monsters[100];
    int num_records = 0;

    wchar_t line[1000]; // 使用 wchar_t 来处理输入行

    Header header;
    fgetws(line, sizeof line / sizeof line[0], fp); // 使用 fgetws 读取韩文字符
    wcsncpy(header.header1, wcstok(line, L","), sizeof header.header1 / sizeof header.header1[0]);
    wcsncpy(header.header2, wcstok(NULL, L","), sizeof header.header2 / sizeof header.header2[0]);
    wcsncpy(header.header3, wcstok(NULL, L"\n"), sizeof header.header3 / sizeof header.header3[0]);

    while (fgetws(line, sizeof line / sizeof line[0], fp)) // 使用 fgetws 读取韩文字符
    {
        wchar_t* token = wcstok(line, L","); // 使用 wcstok 来处理韩文字符
        wcsncpy(monsters[num_records].name, token, sizeof monsters[num_records].name / sizeof monsters[num_records].name[0]);

        token = wcstok(NULL, L",");
        monsters[num_records].hp = _wtoi(token);

        token = wcstok(NULL, L",");
        monsters[num_records].damage = _wtoi(token);

        num_records++;
    }

    for (int i = 0; i < num_records; i++)
    {
        wprintf(L"%s:%ls %s:%d %s:%d\n",
            header.header1, monsters[i].name,
            header.header2, monsters[i].hp,
            header.header3, monsters[i].damage);
    }

    fclose(fp);
    return 0;
}

通过上述更改,程序应该能够正确处理包含韩文字符的 CSV 文件并输出正确的结果。

英文:

I have a csv file that includes Korean characters. But I am not sure how Korean can be printed in the code that I have.

The csv file looks like this:

name,hp,damage
대학오리,20,5
대학냥이,30,10
시계탑기린,100,20

My code:

#define _CRT_SECURE_NO_WARNINGS
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
typedef struct {
char name[1000];
int hp;
int damage;
} Monster;
typedef struct {
char header1[sizeof &quot;name&quot;];
char header2[sizeof &quot;hp&quot;];
char header3[sizeof &quot;damage&quot;];
} Header;
int main()
{
FILE* fp = fopen(&quot;entityData.csv&quot;, &quot;r&quot;);
if (!fp) {
printf(&quot;Error opening file\n&quot;);
return 1;
}
Monster monsters[100];
int num_records = 0;
char line[100];
Header header;
fgets(line, sizeof line, fp);
strncpy(header.header1, strtok(line, &quot;,&quot;), sizeof header.header1);
strncpy(header.header2, strtok(NULL, &quot;,&quot;), sizeof header.header2);
strncpy(header.header3, strtok(NULL, &quot;\n&quot;), sizeof header.header3);
while (fgets(line, sizeof(line), fp))
{
char* token = strtok(line, &quot;,&quot;); //, 기준으로 나눠서 token에 저장
strncpy(monsters[num_records].name, token, 20);
token = strtok(NULL, &quot;,&quot;);
monsters[num_records].hp = atoi(token);
token = strtok(NULL, &quot;,&quot;);
monsters[num_records].damage = atoi(token);
num_records++;
}
for (int i = 0; i &lt; num_records; i++)
{
printf(&quot;%s:%s %s:%d %s:%d\n&quot;,
header.header1, monsters[i].name,
header.header2, monsters[i].hp,
header.header3, monsters[i].damage);
}
fclose(fp);
return 0;
}

The program I wrote reads the csv file above and should print it like this:

name:대학오리 hp:20 damage:5
name:대학냥이 hp:30 damage:10
name:시계탑기린 hp:100 damage:20

Instead the name part is broken.

After some searching around, I realized that Korean letters take up 2 bytes per letter, which does not match char types. I have tried using wchar but that has led to errors, and I feel like that I am stuck.

I know that asking such a question on an English website isn't the best, but I'm really just hoping if anyone knows anything.

答案1

得分: 2

There's nothing wrong with your code. It's Windows that's messed up. (It works perfectly fine on Linux and Macs.) Do this to remedy the problem with Windows:

在 Windows 设置中启用新的 UTF-8 选项。转到语言设置,点击管理语言设置,然后更改系统区域...,选中“Beta: 使用 Unicode UTF-8 以支持全球语言”选项。重新启动计算机。

Then languages in UTF-8 will display correctly in terminals.

Yes, the number of bytes can be more than the number of characters. They are likely stored as UTF-8, which encodes each character in one to four bytes. Each of your Korean characters is three bytes (not two). However, a comma is still a comma and cannot appear inside another character code, so you would be correctly finding the end of your name string.

See this answer for more (much more) on character encodings in Windows.

英文:

There's nothing wrong with your code. It's Windows that's messed up. (It works perfectly fine on Linux and Macs.) Do this to remedy the problem with Windows:

> Enable the new UTF-8 option in Windows settings. Go to the language
> settings, click Administrative language settings, then Change system
> locale… and tick the Beta: Use Unicode UTF-8 for worldwide language
> support option. Restart your computer.

Then languages in UTF-8 will display correctly in terminals.

Yes, the number of bytes can be more than the number of characters. They are likely stored as UTF-8, which encodes each character in one to four bytes. Each of your Korean characters is three bytes (not two). However a comma is still a comma and cannot appear inside another character code, so you would be correctly finding the end of your name string.

See this answer for more (much more) on character encodings in Windows.

huangapple
  • 本文由 发表于 2023年5月15日 00:01:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76248430.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定