英文:
I am trying to read a csv file and print it. I got what I wanted to do, but I'm not sure how I can print Korean characters
问题
程序中的问题是字符数组 name
的大小不足以容纳包含韩文字符的名称。由于韩文字母占据两个字节,因此需要相应地调整结构体 Monster
中 name
的大小。
这是修复后的代码片段:
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
wchar_t name[1000]; // 使用 wchar_t 来存储韩文字符
int hp;
int damage;
} Monster;
typedef struct {
wchar_t header1[sizeof L"name"];
wchar_t header2[sizeof L"hp"];
wchar_t header3[sizeof L"damage"];
} Header;
int main()
{
FILE* fp = fopen("entityData.csv", "r");
if (!fp) {
printf("Error opening file\n");
return 1;
}
Monster monsters[100];
int num_records = 0;
wchar_t line[1000]; // 使用 wchar_t 来处理输入行
Header header;
fgetws(line, sizeof line / sizeof line[0], fp); // 使用 fgetws 读取韩文字符
wcsncpy(header.header1, wcstok(line, L","), sizeof header.header1 / sizeof header.header1[0]);
wcsncpy(header.header2, wcstok(NULL, L","), sizeof header.header2 / sizeof header.header2[0]);
wcsncpy(header.header3, wcstok(NULL, L"\n"), sizeof header.header3 / sizeof header.header3[0]);
while (fgetws(line, sizeof line / sizeof line[0], fp)) // 使用 fgetws 读取韩文字符
{
wchar_t* token = wcstok(line, L","); // 使用 wcstok 来处理韩文字符
wcsncpy(monsters[num_records].name, token, sizeof monsters[num_records].name / sizeof monsters[num_records].name[0]);
token = wcstok(NULL, L",");
monsters[num_records].hp = _wtoi(token);
token = wcstok(NULL, L",");
monsters[num_records].damage = _wtoi(token);
num_records++;
}
for (int i = 0; i < num_records; i++)
{
wprintf(L"%s:%ls %s:%d %s:%d\n",
header.header1, monsters[i].name,
header.header2, monsters[i].hp,
header.header3, monsters[i].damage);
}
fclose(fp);
return 0;
}
通过上述更改,程序应该能够正确处理包含韩文字符的 CSV 文件并输出正确的结果。
英文:
I have a csv file that includes Korean characters. But I am not sure how Korean can be printed in the code that I have.
The csv file looks like this:
name,hp,damage
대학오리,20,5
대학냥이,30,10
시계탑기린,100,20
My code:
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
char name[1000];
int hp;
int damage;
} Monster;
typedef struct {
char header1[sizeof "name"];
char header2[sizeof "hp"];
char header3[sizeof "damage"];
} Header;
int main()
{
FILE* fp = fopen("entityData.csv", "r");
if (!fp) {
printf("Error opening file\n");
return 1;
}
Monster monsters[100];
int num_records = 0;
char line[100];
Header header;
fgets(line, sizeof line, fp);
strncpy(header.header1, strtok(line, ","), sizeof header.header1);
strncpy(header.header2, strtok(NULL, ","), sizeof header.header2);
strncpy(header.header3, strtok(NULL, "\n"), sizeof header.header3);
while (fgets(line, sizeof(line), fp))
{
char* token = strtok(line, ","); //, 기준으로 나눠서 token에 저장
strncpy(monsters[num_records].name, token, 20);
token = strtok(NULL, ",");
monsters[num_records].hp = atoi(token);
token = strtok(NULL, ",");
monsters[num_records].damage = atoi(token);
num_records++;
}
for (int i = 0; i < num_records; i++)
{
printf("%s:%s %s:%d %s:%d\n",
header.header1, monsters[i].name,
header.header2, monsters[i].hp,
header.header3, monsters[i].damage);
}
fclose(fp);
return 0;
}
The program I wrote reads the csv file above and should print it like this:
name:대학오리 hp:20 damage:5
name:대학냥이 hp:30 damage:10
name:시계탑기린 hp:100 damage:20
Instead the name part is broken.
After some searching around, I realized that Korean letters take up 2 bytes per letter, which does not match char types. I have tried using wchar but that has led to errors, and I feel like that I am stuck.
I know that asking such a question on an English website isn't the best, but I'm really just hoping if anyone knows anything.
答案1
得分: 2
There's nothing wrong with your code. It's Windows that's messed up. (It works perfectly fine on Linux and Macs.) Do this to remedy the problem with Windows:
在 Windows 设置中启用新的 UTF-8 选项。转到语言设置,点击管理语言设置,然后更改系统区域...,选中“Beta: 使用 Unicode UTF-8 以支持全球语言”选项。重新启动计算机。
Then languages in UTF-8 will display correctly in terminals.
Yes, the number of bytes can be more than the number of characters. They are likely stored as UTF-8, which encodes each character in one to four bytes. Each of your Korean characters is three bytes (not two). However, a comma is still a comma and cannot appear inside another character code, so you would be correctly finding the end of your name string.
See this answer for more (much more) on character encodings in Windows.
英文:
There's nothing wrong with your code. It's Windows that's messed up. (It works perfectly fine on Linux and Macs.) Do this to remedy the problem with Windows:
> Enable the new UTF-8 option in Windows settings. Go to the language
> settings, click Administrative language settings, then Change system
> locale… and tick the Beta: Use Unicode UTF-8 for worldwide language
> support option. Restart your computer.
Then languages in UTF-8 will display correctly in terminals.
Yes, the number of bytes can be more than the number of characters. They are likely stored as UTF-8, which encodes each character in one to four bytes. Each of your Korean characters is three bytes (not two). However a comma is still a comma and cannot appear inside another character code, so you would be correctly finding the end of your name string.
See this answer for more (much more) on character encodings in Windows.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论