2023年4月19日 16:46:03go评论75阅读模式

英文:

How input UTF-8 characters in MingW64?

问题

平台：Windows x64 22H2

我有以下代码（文件编码格式：UTF-8）：
```c
#include <stdio.h>

int main(int argc, char **argv)
{
    static char text[8];
    scanf("%[^\n]s", text);
    printf("%s\n", text);
    return 0;
}

当仅输入ASCII表中的字符时，它可以正常工作。
但是，当输入汉字或其他Unicode编码的字符时，它无法读取。

如果输入Unicode字符，则text数组的内容是：00 00 00 00 00 00 00 00。
我在Windows CMD中执行了此程序，并使用了以下编译指令：gcc main.c -o main.exe。

我正在尝试添加本地支持，这是修改后的代码：

#include <stdio.h>
#include <locale.h>

int main(int argc, char **argv)
{
    setlocale(LC_ALL, "zh_CN.UTF-8");
    static char text[8];
    scanf("%[^\n]s", text);
    printf("%s\n", text);
    return 0;
}

但是，该数组的内容仍然是：00 00 00 00 00 00 00 00。

我尝试再次将CMD的代码页更改为65001（chcp 65001），但结果仍然相同。
我还尝试添加gcc命令行参数-finput-charset=UTF-8，但仍然不起作用。

但是，当我将代码文件修改为GB系列的编码（如GB2312）或将CMD的代码页更改为936时，它可以正常读取以GB2312编码的数据，如下所示：

输入：你好
输出：ce d2 b5 c4 00 00 00 00

这可以读取Unicode字符，但无法读取UTF-8编码。


<details>
<summary>英文:</summary>

```text
Platform: Windows x64 22H2

I have the following code (File encoding format: UTF-8):

#include &lt;stdio.h&gt;

int main(int argc, char **argv)
{
    static char text[8];
    scanf(&quot;%[^\n]s&quot;, text);
    printf(&quot;%s\n&quot;, text);
    return 0;
}

It works properly when only characters from the ASCII table are input.<br>
But when inputting characters such as Chinese or other Unicode encodings, it will not read.

If Unicode characters is input, the content of the text array is: 00 00 00 00 00 00 00 00.
I executed this program in Windows CMD, and the compilation instructions are: gcc main.c -o main.exe.

I am trying to add local support, and this is the modified code:

#include &lt;stdio.h&gt;
#include &lt;locale.h&gt;

int main(int argc, char **argv)
{
    setlocale(LC_ALL, &quot;zh_CN.UTF-8&quot;);
    static char text[8];
    scanf(&quot;%[^\n]s&quot;, text);
    printf(&quot;%s\n&quot;, text);
    return 0;
}

But the content of this array is still: 00 00 00 00 00 00 00 00.

I tried to change the page number of CMD to 65001 again (chcp 65001), but the result was still the same.
I also tried adding the gcc command line parameter -finput-charset=UTF-8, but it still didn't work.

But when I modify the code file to the encoding of GB series (such as GB2312) or change the page number of CMD to 936, it can read the data encoded in GB2312 normally, like this:

input: 你好
output: ce d2 b5 c4 00 00 00 00

This can read Unicode characters, but not UTF-8 encoding.

答案1

得分: 1

In a bash shell with locale set to LANG=en_US.UTF-8, this correctly reads a UTF-8 string.

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    char text[100];
    scanf("%99s", text);
    printf("%s\n", text);
    for (int i=0; i < strlen(text); i++)
        printf(" %02x",(unsigned char) text[i]);
    printf("\n");
    return 0;
}

快速的棕色狐狸
快速的棕色狐狸
e5 bf ab e9 80 9f e7 9a 84 e6 a3 95 e8 89 b2 e7 8b 90 e7 8b b8

英文:

In a bash shell with locale set to LANG=en_US.UTF-8, this correctly reads a UTF-8 string.

#include &lt;stdio.h&gt;
#include &lt;string.h&gt;

int main(int argc, char **argv)
{
    char text[100];
    scanf(&quot;%99s&quot;, text);
    printf(&quot;%s\n&quot;, text);
    for (int i=0; i &lt; strlen(text); i++)
        printf(&quot; %02x&quot;,(unsigned char) text[i]);
    printf(&quot;\n&quot;);
    return 0;
}


快速的棕色狐狸
快速的棕色狐狸
 e5 bf ab e9 80 9f e7 9a 84 e6 a3 95 e8 89 b2 e7 8b 90 e7 8b b8

答案2

得分: 0

#include <wchar.h>

int main()
{
    static wchar_t text[32];
    wscanf(L"%ls", text);
    wprintf(L"%ls\n", text);

    return 0;
}

英文:

Try <wchar.h>?

#include &lt;wchar.h&gt;

int main()
{
    static wchar_t text[32];
    wscanf(L&quot;%ls&quot;, text);
    wprintf(L&quot;%ls\n&quot;, text);

    return 0;
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在MingW64中输入UTF-8字符？

问题

答案1

答案2

Wsus 和 PowerShell

_write等人在将C++库与裸机ARM C项目链接时被引入

How to start a GUI executable and a console program and terminate the console application on GUI application closed by the user?

标识符是否仅在某些情况下是单独的标记？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论