UTF-8 does not print characters to the console



public class MainDefault {
    public static void main(String[] args) {



$ javac MainDefault.java
$ java MainDefault



$ javac -encoding UTF8 MainDefault.java
$ java MainDefault



$ java -Dfile.encoding=UTF8 MainDefault


这似乎与控制台无关(在Windows 10上使用Git Bash),因为它正常打印出字符




I have the following code

public class MainDefault {
public static void main (String[] args) {

But can't seem to print the special characters to the console

When I do the following, I get the following result

$ javac MainDefault.java
$ java MainDefault


On the other hand, when I compile it and run it like this

$ javac -encoding UTF8 MainDefault.java
$ java MainDefault


And when I run it using the file encoding UTF8 flag, I get the following

$ java -Dfile.encoding=UTF8 MainDefault


It's doesn't seem to be a problem with the console (Git Bash on Windows 10), as it prints the characters normally


Thanks for your help

# 答案1
**得分**: 12








如果你使用Git Bash(MinTTY),你可以按照@kriegaex的说明来验证或配置```UTF-8```作为终端仿真器的编码。

Linux、UNIX或类似Mac OS的UNIX衍生系统不使用代码页标识,而是使用地区设置(locale)。地区设置信息可能在不同系统间有所不同,但你可以使用```locale```命令或尝试检查```LC_*```系统变量以找到所需信息。




java -Dfile.encoding=UTF8 MainDefault



如果你在Windows上使用Git Bash,请考虑阅读@rmunge的答案:它提供了有关该工具可能存在的bug的信息,这可能是问题的原因,并且可能需要手动进行编码调整才能正确运行终端。


Your code are not printing the right characters in the console because your Java program and the console are using different character sets, different encodings.

If you want to obtain the same characters, you first need to determine which character sets are in place.

This process will depend on the "console" in which you are outputting your results.

If you are working with Windows and cmd, as @RickJames suggested, you can use the chcp command to determine the active code page.

Oracle provides the Java full supported encodings information, and the correspondence with other alias - code pages in this case - in this page.

This stackoverflow answer also provides some guidance about the mapping between Windows Code Pages and Java charsets.

As you can see in the provided links, the code page for UTF-8 is 65001.

If you are using Git Bash (MinTTY), you can follow @kriegaex instructions to verify or configure UTF-8 as the terminal emulator encoding.

Linux and UNIX, or UNIX derived systems like Mac OS, do not use code page identifiers, but locales. The locale information can vary between systems, but you can either use the locale command or try to inspect the LC_* system variables to find the required information.

This is the output of the locale command in my system:


Once you know this information, you need to run your Java program with the file.encoding VM option corresponding to the right charset:

java -Dfile.encoding=UTF8 MainDefault

Some classes, like PrintStream or PrintWriter, allows you to indicate the Charset in which the information will be outputted.

The -encoding javac option only allows you to specify the character encoding used by source files.

If you are using Windows with Git Bash, consider also reading this @rmunge answer: it provides information about a possible bug in the tool that may be the reason for the problem and that prevents the terminal from running correctly out of the box without the need for manual encoding adjustments.


得分: 5

我也在 Windows 10 上使用 Git Bash,对我来说它完全正常运行。



终端版本为 mintty 3.0.2 (x86_64-pc-msys),我的文本属性如下,




通过将字符集设置为 CP437 (OEM 代码页)(注意这同时自动将区域设置更改为 C),我可以得到与你相同的输出结果。


然后,当我将其改回 UTF-8 (Unicode) 后,我可以得到预期的输出!




I am also using the Git Bash on Windows 10 and It works totally fine for me.

Here's how it prints,


Terminal version is mintty 3.0.2 (x86_64-pc-msys) and My text properties were,


So, I tried to reproduce your outputs by changing Character Sets;


By setting Character Set to CP437 (OEM codepage) (Note that this automatically changed Locale to C too), I could be able to get the output as you got.


And then after when I change it back to UTF-8 (Unicode), the I could get the output as expected!


Therefore, it is clear that the problem is with your console's Character Set.


得分: 5

The short version:


  • Windows 10系统,使用英语、德语、法语或任何其他导致ANSI和OEM代码页编码²和³不同的语言
  • Git for Windows 2.27.0(使用默认设置安装),配置为使用MinTTY并禁用伪控制台的实验性支持
  • 源代码以UTF-8编码存储


  • 要么重新安装Git for Windows 2.27.0,并在安装程序的最后一页启用伪控制台的实验性支持,或者升级到最新的2.28版本
  • 使用javac -encoding UTF8编译代码
  • 调用java时不要覆盖file.encoding

The medium version:

Git for Windows 2.27.0使用了一个版本的MSYS2,该版本在不支持伪控制台的情况下,通过调用SetConsoleCP来设置MinTTY的代码页。Java运行时通过调用GetConsoleCP来确定<code>System.out</code>的代码页。由于在MinTTY终端内执行Java时没有设置代码页,调用失败,Java使用<code>Charset.defaultCharset()</code>返回的字符集作为回退。但在上述描述的Windows安装中,<code>Charset.defaultCharset()</code>返回Cp-1252,而控制台的默认字符集是Cp-850。这两个代码页并不完全兼容。这导致了奇怪的输出。

The long version:



  • <code>Charset.defaultCharset()</code>返回ANSI代码页(通常为cp-1252)。这个字符集由_file.encoding_系统属性指定。如果未作为VM参数指定,java可执行文件会确定ANSI代码页,并在初始化期间添加系统属性。<code>String.getBytes()</code>使用<code>Charset.defaultCharset()</code>返回的字符集。
  • <code>System.out</code>对于控制台使用OEM代码页(通常为cp-850)。Java可执行文件通过调用GetConsoleCP函数获取此代码页,并将其设置为内部系统属性_sun.stdout.encoding_和_sun.stdout.encoding_的值。当调用GetConsoleCP失败时,使用<code>Charset.defaultCharset()</code>返回的字符集。这只在尚未在其中运行java.exe的控制台在调用SetConsoleCP之前发生。


$ javac MainDefault.java
$ java MainDefault




$ javac -encoding UTF8 MainDefault.java
$ java MainDefault


使用_-encoding UTF8_参数,javac将UTF-8编码的源代码解释为UTF-8。因此,"²³"的4个字节会被正确识别为两个字符。<code>System.out</code>将这两个字符编码为cp-1252,生成2个字节。但由于控制台仍然使用cp-850,输出仍然损坏。<code>String.getBytes</code>也将这两个字符编码为cp-1252,生成2个字节。

$ java -Dfile.encoding=UTF8 MainDefault





The short version:

The unexpected behavior is reproducible with the following setup:

  • Windows 10 with English, German or French language, or any other language that leads to ANSI and OEM codepages that encode ² and ³ differently

  • Git for Windows 2.27.0 (installed with default setting i.e.
    configured to use MinTTY and experimental support for pseudo consoles

  • Source code is stored in UTF-8 encoding

To get correct bahavior:

  • Either re-install Git for Windows 2.27.0 and enable experimental
    support for pseudo consoles on the last page of the installer or
    upgrade to latest 2.28 version

  • Compile your code with javac -encoding UTF8

  • Call java without overriding file.encoding

The medium version:

Git for Windows 2.27.0 uses a version of MSYS2 that does not set the code page for MinTTY by calling SetConsoleCP when support for pseudo consoles is disabled. The Java runtime determines the codepage for <code>System.out</code> by calling GetConsoleCP. Since no codepage is set when Java is executed within MinTTY terminal, the call fails and Java uses the charset returned by <code>Charset.defaultCharset()</code> as fallback. But in a Windows installation as describe above, <code>Charset.defaultCharset()</code> returns Cp-1252 while the default charset for consoles is Cp-850. The two codepages are not fully compatible. This leads to the strange output.

The long version:

Windows has two types of codepages: ANSI and OEM codepages. The first type is intended for UI applications that do not support Unicode and the later is used for console applications. Both types encode a single character in 1 Byte but they are not fully compatible.

Therefore on Windows Java has to deal with two charsets instead of one:

  • <code>Charset.defaultCharset()</code> returns the ANSI codepage (usually cp-1252). This charset is specified by the file.encoding system property. If not specified as VM argument, the java executable determines the ANSI codepage and adds the system property during initialization. <code>String.getBytes()</code> uses the charset returned by <code>Charset.defaultCharset()</code>.
  • <code>System.out</code> uses the OEM codepage for consoles (usually cp-850). The java executable gets this codepage by calling the GetConsoleCP function and sets the it as value for the internal system properties, sun.stdout.encoding and sun.stdout.encoding. When the call to GetConsoleCP fails the charset returned by <code>Charset.defaultCharset()</code> is used. This only happens when the console in which java.exe is executed hasn't set the OEM codepage before, by calling SetConsoleCP

So what happens now in the setup mentioned above?

$ javac MainDefault.java
$ java MainDefault


The native call of GetConsoleCP fails due to the bug in MSYS2. Therefore <code>System.out</code> falls back to the charset returned by <code>Charset.defaultCharset()</code> which is cp-1252. But the OEM codepage of the console is cp-850. Therefore System.out.println("²³") produces unexpected output.

The source code is stored in UTF-8. Encoding "²³" in UTF-8 requires 4 Bytes. But due to the missing -encoding parameter javac assumes default encoding that uses one byte per character. Therefore it interprets the 4 Bytes as 4 characters. <code>String.getBytes</code> uses the 1-Byte, based ANSI code page, cp-1252 and therefore returns 4 bytes.

$ javac -encoding UTF8 MainDefault.java
$ java MainDefault


With the -encoding UTF8 parameter javac interprets the UTF-8 encoded source as UTF-8. So the 4 bytes of "²³" are correclty recognized as two characters. <code>System.out</code> encodes the two characters in cp-1252 which leads to 2 bytes. But since the console still uses cp-850 the output is still corrupted. <code>String.getBytes</code> encodes the wo characters also in cp-1252 which leads to 2 bytes.

$ java -Dfile.encoding=UTF8 MainDefault


The system property, file.encoding overrides the charset returned by <code>Charset.defaultCharset()</code> that is also used by <code>String.getBytes()</code>. The two characters which were first wrongly interpreted by javac as 4 characters in 8-Bit encoding are now correclty encoded in UTF-8 as two characters encoded in two bytes per character. This leads to 4 bytes. Since file.encoding does not have any effect on the charset that is used by <code>System.out</code> the 4 (and not 2, due the wrong interpretation of javac) characters are still encoded in cp-1252, the console still uses cp-850 and you get still a corrupted output.


Your console can print ²³ since the console's 8-Bit OEM code page (cp-850) supports both characters. But it encodes it slightly different than the ANSI code page cp-1252 that is used by <code>System.out</code> UTF-8不会将字符打印到控制台。


得分: 4


十六进制代码在UTF-8下看起来还不错。也许你的Git Bash字符集不是UTF-8。对我来说,它看起来是这样的:




更新于2020-09-13: 这里有证据证明 chcp.com &lt;codepage&gt; 在Git Bash(mintty)中不起作用。它根本没有任何效果。你确实必须在mintty设置对话框中选择正确的代码页。


更新于2020-09-15: 好的,在阅读了@rmunge的答案后,我升级到了Git 2.28,并且能够重现原帖作者的问题,并且也使用了chcp的变通方法(在我这种情况下,它的工作方式与@rmunge描述的不同)。因为Git(或者分别是MSYS2)在最新版本中存在如此多的错误,而且我不希望每次打开新控制台时都要在Git Bash中使用chcp.com,所以我只是降级到了我之前使用了3年且没有任何问题的2.15.1版本。也许有更后面的版本没有这个控制台错误,我没有尝试,只是使用我计算机上下载文件夹中的旧安装程序。我建议每个人都这样做,现在绕过这个讨厌的错误。使用一个没有错误的控制台版本,它就像我描述的那样工作。


The hex codes look okay for UTF-8. Maybe your character set for Git Bash is not UTF-8. For me it looks like this:


The console output then also looks fine:


Update 2020-09-13: Here is proof that chcp.com &lt;codepage&gt; does not work in Git Bash (mintty). It has no effect whatsoever. You really do have to select the correct codepage in the mintty settings dialogue.


Update 2020-09-15: Okay, after I read @rmunge's answer I upgraded to Git 2.28 and could reproduce the OP's problem and also use the chcp workaround (it did not work as described by @rmunge in my case). Because Git (or MSYS2, respectively) are so buggy in the latest versions and I don't wish to use chcp.com from inside Git Bash every time I open a new console, I just downgraded to version 2.15.1 which I had used for 3 years without any problems before. Maybe there are later versions without the console bug, I did not try but just use my old installer from the downloads folder on my computer. I recommend everyone to do the same and now work around this ugly bug. With a non-buggy console version, it just works like I described.


得分: 1






On Windows, it has to do with your code page.
You can use the command chcp to set the code page you want (for eg: if you want to set it up for a specific program launched) or you can specify the charset corresponding to the codepage in the java commanline.

If the current codepage does not support the characters you are printing, you will see garbage in the console.

The reason why different shells may behave differently is due to the codepage/charsets that are loaded by default.

Please check out this SO post for how it is done:


得分: 1

我在 Windows 的 Git Bash 中遇到了相同的问题。javajavac 不能正确地显示中文字符。将 git-bash 的字符集设置为 UTF8 也无法解决问题。chcp 命令也无效。从 Git Bash 的安装向导中,我已经知道像 python 这样的程序在没有 winpty 的情况下无法正常工作。我在 ~/.bashrc 中添加了别名 alias python='winpty python'。因此,我尝试了 winpty java Foo.javawinpty javac Foo.java,幸运的是问题解决了。我将这些别名添加到了 ~/.bashrc 以解决问题:

alias java='winpty java'
alias javac='winpty javac'

最近的 Git Bash for Windows 版本(v2.2x)已经包含了一个关于 winpty 的实验性功能,但似乎仍然存在一些问题,所以我迄今为止仍然保留了这些别名。


I encountered the same problem in git bash for Windows. java and javac cannot print Chinese characters properly. Setting git-bash's character set as UTF8 does not help. chcp does not work either. From git bash's installation wizard, I had known that programs like python do not work properly without winpty. I had added alias python=&#39;winpty python to ~/.bashrc. So I tried winpty java Foo.java and winpty javac Foo.java, and luckily the problem was gone. I added the aliases to ~/.bashrc to fix the problem:

alias java=&#39;winpty java&#39;
alias javac=&#39;wintpy javac&#39;

The recent versions(v2.2x) of git bash for Windows have included an experimental feature about winpty, but it seems it still has some problems, so I've kept these aliases so far.


得分: 0

Hex C2B2 C2B3, when interpreted as UTF-8 is &#178;&#179;.

I assume you are using a Windows "cmd terminal"?

The command "chcp" controls the "code page". chcp 65001 provides utf8, but it needs a special charset installed, too. To set the font in the console window: Right-click on the title of the window → Properties → Font → pick Lucida Console


Hex C2B2 C2B3, when interpreted as UTF-8 is &#178;&#179;.

I assume you are using a Windows "cmd terminal"?

The command "chcp" controls the "code page". chcp 65001 provides utf8, but it needs a special charset installed, too. To set the font in the console window: Right-click on the title of the window → Properties → Font → pick Lucida Console


得分: 0

Please verify that your Windows 10 installation does not have Unicode UTF-8 support enabled. You can see this option by going to Settings and then: All Settings -> Time & Language -> Language -> "Administrative Language Settings"

This is what it looks like - the feature should be unchecked.


<code>"²³".getBytes()</code> returns the encoding of the string, based on the detected default charset. On a Windows 10 system, the default charset should usually be a 1-Byte based encoding, independent of whether you launch java.exe from a Windows console or from Git Bash. But your first screenshot shows a 4-Byte encoding that is actually UTF-8. So your JVM seems to detect UTF-8 as the wrong default charset that is incompatible with the codepage of your console.

Your console can print ²³ because both characters are supported by the used code page, but the encoding is based on one byte per character, while UTF-8 encoding requires 2 Bytes for each of these two characters.

I have no simple explanation for your second screenshot, but be aware that Git Bash is based on MSYS2, which again uses mintty terminal emulator. While MSYS2 uses UTF-8, and mintty also seems to support UTF-8, the whole thing is wrapped within a Windows console that is based on an OEM codepage that is incompatible with UTF-8. The whole setup then runs on an operating system that internally uses UTF-16. Now combined with a beta setting that overrides the entire OEM codebase concept on the OS level, this setup provides enough complexity for some incomprehensible behavior.


Please verify that your Windows 10 installation does not have Unicode UTF-8 support enabled. You can see this option by going to Settings and then: All Settings -> Time & Language -> Language -> "Administrative Language Settings"

This is what it looks like - the feature should be unchecked.



<code>"²³".getBytes()</code> returns the encoding of the string, based on the detected default charset. On a Windows 10 system the default charset should usually be a 1-Byte based encoding, independent from whether you launch java.exe from a Windows console or from Git Bash. But your first screenshot shows a 4-Byte encoding that is actually UTF-8. So your JVM seems to detect UTF-8 as the wrong default charset that is incompatible with the codepage of your console.

Your console can print ²³ because both characters are supported by the used code page, but the encoding is based on one byte per character while UTF-8 encoding requires 2 Bytes for each of these two characters.

I have no simple explanation for your second screenshot but be aware that Git Bash is based on MSYS2 which again uses mintty terminal emulator. While MSYS2 uses UTF-8, and mintty also seems to support UTF-8 the whole thing is wrapped within a Windows console that is based on an OEM codepage that is incompatible to UTF-8. The whole thing then runs on an operating system that internally uses UTF-16. Now combined with a beta setting that overrules the whole OEM codebase concept on OS-level this setup provides enough complexity for some incomprehensible behavior.

