Delphi Berlin Unicode Issue

huangapple go评论81阅读模式
英文:

Delphi Berlin Unicode Issue

问题

I have a very strange Delphi compiler issue relating to Unicode characters.

我有一个与Unicode字符相关的非常奇怪的Delphi编译器问题。

I have a unit with this const definition:

我有一个包含这个常量定义的单元:

const SLANG_SPANISH_ESP = 'Español';

When I compile this on my PC, the ñ gets converted to the ASCII equivalent. I've used a hex viewer to examine the relevant files:

当我在我的电脑上编译时,ñ被转换为ASCII等效字符。我使用了十六进制查看器来检查相关文件:

Within the pas source file, the ñ is encoded in UTF-8 as C3 B1.

在pas源文件中,ñ以UTF-8编码为C3 B1

Within the generated DCU file, the ñ is encoded in ASCII (?) as F1.

在生成的DCU文件中,ñ以ASCII编码为F1

All the other Delphi PCs within our group compile the DCU differently, generating the DCU file with the ñ encoded in UTF-8 as C3 B1.

我们组内的所有其他Delphi电脑编译DCU的方式都不同,生成的DCU文件将ñ编码为UTF-8的C3 B1

This is just one example, but many of the non-ASCII characters suffer the same fate.

这只是一个例子,但许多非ASCII字符都遇到了相同的问题。

I have tried hard over the last couple of days to identify the cause, without success. I have eliminated the project files and source code as we use SVN. I double checked by manually copying the project folder from a colleague's PC.

在过去的几天里,我努力尝试找出原因,但没有成功。我已经排除了项目文件和源代码,因为我们使用SVN。我通过手动从同事的电脑上复制项目文件夹来进行了双重检查。

I have looked through the Delphi settings for something that might affect this, without success either.

我已经查看了Delphi的设置,但也没有成功找到可能影响这个问题的内容。

It's very frustrating and worrying to imagine that the same source code on different PCs compiles to different results. My only hope now is that someone from the community will be able to give me a clue.

想象同样的源代码在不同的电脑上编译出不同的结果真的令人沮丧和担忧。我唯一的希望现在是社区中的某人能够给我一些线索。

英文:

I have a very strange Delphi compiler issue relating to Unicode characters.

I have a unit with this const definition:

const SLANG_SPANISH_ESP = 'Español';

When I compile this on my PC, the ñ gets converted to the ASCII equivalent. I've used a hex viewer to examine the relevant files:

Within the pas source file, the ñ is encoded in UTF-8 as C3 B1.

Within the generated DCU file, the ñ is encoded in ASCII (?) as F1.

All the other Delphi PCs within our group compile the DCU differently, generating the DCU file with the ñ encoded in UTF-8 as C3 B1.

This is just one example, but many of the non-ASCII characters suffer the same fate.

I have tried hard over the last couple of days to identify the cause, without success. I have eliminated the project files and source code as we use SVN. I double checked by manually copying the project folder from a colleague's PC.

I have looked through the Delphi settings for something that might affect this, without success either.

It's very frustrating and worrying to imagine that the same source code on different PCs compiles to different results. My only hope now is that someone from the community will be able to give me a clue.

答案1

得分: 1

I finally got to the bottom of this issue. It turns out that the pas file in question was not saved as UTF-8 despite what the IDE was telling me. In fact this is a known issue/quirk with Delphi where the unit is saved with UTF-8 characters but without the BOM. You can refer to Marco Cantu's blog on this issue: The Delphi Compiler and UTF-8 Encoded Source Code Files With no BOM

The reason that the file did not have a BOM was because it was generated using an in-house tool. This tool has since been updated to output the BOM too.

Finally, I discovered that, on a given machine, building the project with the IDE or externally via MsBuild.exe would yield different results. The IDE correctly interprets the unit as UTF-8, whereas MsBuild.exe interprets the unit as Ansi.

英文:

I finally got to the bottom of this issue. It turns out that the pas file in question was not saved as UTF-8 despite what the IDE was telling me. In fact this is a known issue/quirk with Delphi where the unit is saved with UTF-8 characters but without the BOM.

You can refer to Marco Cantu's blog on this issue: The Delphi Compiler and UTF-8 Encoded Source Code Files With no BOM

The reason that the file did not have a BOM was because it was generated using a in-house tool. This tool has since been updated to output the BOM too.

Finally I discovered that, on a given machine, building the project with the IDE or externally via MsBuild.exe would yield different results. The IDE correctly interprets the unit a UTF-8, whereas MsBuild.exe interprets the unit as Ansi.

huangapple
  • 本文由 发表于 2020年1月6日 22:00:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/59613448.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定