2023年5月15日 00:01:54go评论97阅读模式

英文:

I am trying to read a csv file and print it. I got what I wanted to do, but I'm not sure how I can print Korean characters

问题

程序中的问题是字符数组 name 的大小不足以容纳包含韩文字符的名称。由于韩文字母占据两个字节，因此需要相应地调整结构体 Monster 中 name 的大小。

这是修复后的代码片段：

#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
    wchar_t name[1000]; // 使用 wchar_t 来存储韩文字符
    int hp;
    int damage;
} Monster;
typedef struct {
    wchar_t header1[sizeof L"name"];
    wchar_t header2[sizeof L"hp"];
    wchar_t header3[sizeof L"damage"];
} Header;
int main()
{
    FILE* fp = fopen("entityData.csv", "r");
    if (!fp) {
        printf("Error opening file\n");
        return 1;
    }
    Monster monsters[100];
    int num_records = 0;
    wchar_t line[1000]; // 使用 wchar_t 来处理输入行
    Header header;
    fgetws(line, sizeof line / sizeof line[0], fp); // 使用 fgetws 读取韩文字符
    wcsncpy(header.header1, wcstok(line, L","), sizeof header.header1 / sizeof header.header1[0]);
    wcsncpy(header.header2, wcstok(NULL, L","), sizeof header.header2 / sizeof header.header2[0]);
    wcsncpy(header.header3, wcstok(NULL, L"\n"), sizeof header.header3 / sizeof header.header3[0]);
    while (fgetws(line, sizeof line / sizeof line[0], fp)) // 使用 fgetws 读取韩文字符
    {
        wchar_t* token = wcstok(line, L","); // 使用 wcstok 来处理韩文字符
        wcsncpy(monsters[num_records].name, token, sizeof monsters[num_records].name / sizeof monsters[num_records].name[0]);
        token = wcstok(NULL, L",");
        monsters[num_records].hp = _wtoi(token);
        token = wcstok(NULL, L",");
        monsters[num_records].damage = _wtoi(token);
        num_records++;
    }
    for (int i = 0; i < num_records; i++)
    {
        wprintf(L"%s:%ls %s:%d %s:%d\n",
            header.header1, monsters[i].name,
            header.header2, monsters[i].hp,
            header.header3, monsters[i].damage);
    }
    fclose(fp);
    return 0;
}

通过上述更改，程序应该能够正确处理包含韩文字符的 CSV 文件并输出正确的结果。

英文:

I have a csv file that includes Korean characters. But I am not sure how Korean can be printed in the code that I have.

The csv file looks like this:

name,hp,damage
대학오리,20,5
대학냥이,30,10
시계탑기린,100,20

My code:

#define _CRT_SECURE_NO_WARNINGS
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
typedef struct {
char name[1000];
int hp;
int damage;
} Monster;
typedef struct {
char header1[sizeof &quot;name&quot;];
char header2[sizeof &quot;hp&quot;];
char header3[sizeof &quot;damage&quot;];
} Header;
int main()
{
FILE* fp = fopen(&quot;entityData.csv&quot;, &quot;r&quot;);
if (!fp) {
printf(&quot;Error opening file\n&quot;);
return 1;
}
Monster monsters[100];
int num_records = 0;
char line[100];
Header header;
fgets(line, sizeof line, fp);
strncpy(header.header1, strtok(line, &quot;,&quot;), sizeof header.header1);
strncpy(header.header2, strtok(NULL, &quot;,&quot;), sizeof header.header2);
strncpy(header.header3, strtok(NULL, &quot;\n&quot;), sizeof header.header3);
while (fgets(line, sizeof(line), fp))
{
char* token = strtok(line, &quot;,&quot;); //, 기준으로 나눠서 token에 저장
strncpy(monsters[num_records].name, token, 20);
token = strtok(NULL, &quot;,&quot;);
monsters[num_records].hp = atoi(token);
token = strtok(NULL, &quot;,&quot;);
monsters[num_records].damage = atoi(token);
num_records++;
}
for (int i = 0; i &lt; num_records; i++)
{
printf(&quot;%s:%s %s:%d %s:%d\n&quot;,
header.header1, monsters[i].name,
header.header2, monsters[i].hp,
header.header3, monsters[i].damage);
}
fclose(fp);
return 0;
}

The program I wrote reads the csv file above and should print it like this:

name:대학오리 hp:20 damage:5
name:대학냥이 hp:30 damage:10
name:시계탑기린 hp:100 damage:20

Instead the name part is broken.

After some searching around, I realized that Korean letters take up 2 bytes per letter, which does not match char types. I have tried using wchar but that has led to errors, and I feel like that I am stuck.

I know that asking such a question on an English website isn't the best, but I'm really just hoping if anyone knows anything.

答案1

得分: 2

There's nothing wrong with your code. It's Windows that's messed up. (It works perfectly fine on Linux and Macs.) Do this to remedy the problem with Windows:

在 Windows 设置中启用新的 UTF-8 选项。转到语言设置，点击管理语言设置，然后更改系统区域...，选中“Beta: 使用 Unicode UTF-8 以支持全球语言”选项。重新启动计算机。

Then languages in UTF-8 will display correctly in terminals.

Yes, the number of bytes can be more than the number of characters. They are likely stored as UTF-8, which encodes each character in one to four bytes. Each of your Korean characters is three bytes (not two). However, a comma is still a comma and cannot appear inside another character code, so you would be correctly finding the end of your name string.

See this answer for more (much more) on character encodings in Windows.

英文:

There's nothing wrong with your code. It's Windows that's messed up. (It works perfectly fine on Linux and Macs.) Do this to remedy the problem with Windows:

> Enable the new UTF-8 option in Windows settings. Go to the language
> settings, click Administrative language settings, then Change system
> locale… and tick the Beta: Use Unicode UTF-8 for worldwide language
> support option. Restart your computer.

Then languages in UTF-8 will display correctly in terminals.

Yes, the number of bytes can be more than the number of characters. They are likely stored as UTF-8, which encodes each character in one to four bytes. Each of your Korean characters is three bytes (not two). However a comma is still a comma and cannot appear inside another character code, so you would be correctly finding the end of your name string.

See this answer for more (much more) on character encodings in Windows.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

I am trying to read a csv file and print it. I got what I wanted to do, but I'm not sure how I can print Korean characters

问题

答案1

无法理解Aho＆Ullman书中的这种LALR(1)分析算法。

这两个for循环等效吗？

如何解决函数调用中指针值的变化？这是一个cgo的错误吗？

如何在cgo中将golang字符串添加到C结构体中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。