如何在MingW64中输入UTF-8字符?

huangapple go评论75阅读模式
英文:

How input UTF-8 characters in MingW64?

问题

平台:Windows x64 22H2

我有以下代码(文件编码格式:UTF-8):
```c
#include <stdio.h>

int main(int argc, char **argv)
{
    static char text[8];
    scanf("%[^\n]s", text);
    printf("%s\n", text);
    return 0;
}

当仅输入ASCII表中的字符时,它可以正常工作。
但是,当输入汉字或其他Unicode编码的字符时,它无法读取。

如果输入Unicode字符,则text数组的内容是:00 00 00 00 00 00 00 00
我在Windows CMD中执行了此程序,并使用了以下编译指令:gcc main.c -o main.exe

我正在尝试添加本地支持,这是修改后的代码:

#include <stdio.h>
#include <locale.h>

int main(int argc, char **argv)
{
    setlocale(LC_ALL, "zh_CN.UTF-8");
    static char text[8];
    scanf("%[^\n]s", text);
    printf("%s\n", text);
    return 0;
}

但是,该数组的内容仍然是:00 00 00 00 00 00 00 00

我尝试再次将CMD的代码页更改为65001(chcp 65001),但结果仍然相同。
我还尝试添加gcc命令行参数-finput-charset=UTF-8,但仍然不起作用。

但是,当我将代码文件修改为GB系列的编码(如GB2312)或将CMD的代码页更改为936时,它可以正常读取以GB2312编码的数据,如下所示:

输入:你好
输出:ce d2 b5 c4 00 00 00 00

这可以读取Unicode字符,但无法读取UTF-8编码。


<details>
<summary>英文:</summary>

```text
Platform: Windows x64 22H2

I have the following code (File encoding format: UTF-8):

#include &lt;stdio.h&gt;

int main(int argc, char **argv)
{
    static char text[8];
    scanf(&quot;%[^\n]s&quot;, text);
    printf(&quot;%s\n&quot;, text);
    return 0;
}

It works properly when only characters from the ASCII table are input.<br>
But when inputting characters such as Chinese or other Unicode encodings, it will not read.

If Unicode characters is input, the content of the text array is: 00 00 00 00 00 00 00 00.
I executed this program in Windows CMD, and the compilation instructions are: gcc main.c -o main.exe.

I am trying to add local support, and this is the modified code:

#include &lt;stdio.h&gt;
#include &lt;locale.h&gt;

int main(int argc, char **argv)
{
    setlocale(LC_ALL, &quot;zh_CN.UTF-8&quot;);
    static char text[8];
    scanf(&quot;%[^\n]s&quot;, text);
    printf(&quot;%s\n&quot;, text);
    return 0;
}

But the content of this array is still: 00 00 00 00 00 00 00 00.

I tried to change the page number of CMD to 65001 again (chcp 65001), but the result was still the same.
I also tried adding the gcc command line parameter -finput-charset=UTF-8, but it still didn't work.

But when I modify the code file to the encoding of GB series (such as GB2312) or change the page number of CMD to 936, it can read the data encoded in GB2312 normally, like this:

input: 你好
output: ce d2 b5 c4 00 00 00 00

This can read Unicode characters, but not UTF-8 encoding.

答案1

得分: 1

In a bash shell with locale set to LANG=en_US.UTF-8, this correctly reads a UTF-8 string.

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    char text[100];
    scanf("%99s", text);
    printf("%s\n", text);
    for (int i=0; i < strlen(text); i++)
        printf(" %02x",(unsigned char) text[i]);
    printf("\n");
    return 0;
}

快速的棕色狐狸
快速的棕色狐狸
e5 bf ab e9 80 9f e7 9a 84 e6 a3 95 e8 89 b2 e7 8b 90 e7 8b b8

英文:

In a bash shell with locale set to LANG=en_US.UTF-8, this correctly reads a UTF-8 string.

#include &lt;stdio.h&gt;
#include &lt;string.h&gt;

int main(int argc, char **argv)
{
    char text[100];
    scanf(&quot;%99s&quot;, text);
    printf(&quot;%s\n&quot;, text);
    for (int i=0; i &lt; strlen(text); i++)
        printf(&quot; %02x&quot;,(unsigned char) text[i]);
    printf(&quot;\n&quot;);
    return 0;
}


快速的棕色狐狸
快速的棕色狐狸
 e5 bf ab e9 80 9f e7 9a 84 e6 a3 95 e8 89 b2 e7 8b 90 e7 8b b8

答案2

得分: 0

#include <wchar.h>

int main()
{
    static wchar_t text[32];
    wscanf(L"%ls", text);
    wprintf(L"%ls\n", text);

    return 0;
}
英文:

Try <wchar.h>?

#include &lt;wchar.h&gt;

int main()
{
    static wchar_t text[32];
    wscanf(L&quot;%ls&quot;, text);
    wprintf(L&quot;%ls\n&quot;, text);

    return 0;
}

huangapple
  • 本文由 发表于 2023年4月19日 16:46:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76052477.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定