匹配 MS Access 中的重音字符 – MS Access 的排序规则

huangapple go评论41阅读模式
英文:

Matching MS Access accented characters - collation in MS Access

问题

以下是已翻译的内容:

我的紧急需求是在MS Access中进行不考虑口音的比较。我正在使用Office 365 Access。

这与Unicode无关,因为欧洲带重音的字符存在于所有Windows-1252(有时在Microsoft产品和文档中被误称为“ANSI”),现代Unicode和UCS-2中。

我找到的Access“数据类型”页面提到“每个字符两个字节”,这使它听起来像是UCS-2,但没有详细信息。类似地,“排序顺序”下拉列表列出了一些未经记录的值。

实际示例:比较“Dvorak”和“Dvořák”。在MS Access中,这些不相等

今天我的目标不是寻找解决方法(我可以自己做),而是更好地了解2023年MS Access的能力。

经过了SQL Server和.NET字符串的增量支持改进后,我的第一个想法是,“2023年了,MS Access肯定可以处理排序规则了”。

我的最关键的问题是:Office 365 Access在其最新版本中支持“确切”哪些编码(“排序顺序”),VBA是否使用相同的字符集,或者在在MS Access内使用带重音字符时,是否会出现翻译或问题。

英文:

My immediate need is to do an accent-insensitive comparison in MS Access. I am using Office 365 Access.

This is not strictly speaking a Unicode question as the European accented characters are present in all of Windows-1252 (sometimes misleadingly called "ANSI" in Microsoft products and documentation), "modern" Unicode and UCS-2.

The Access "Data Types" page I found mentioned "two bytes per character", which makes it sound like UCS-2, but with no details. Similarly, the "sort order" drop-downs list a number of values that are also undocumented.

Actual example: compare "Dvorak" to "Dvořák". These are not equal in MS Access.

It is NOT my goal today to find a work-around (I can do that myself) - it is to better understand MS Access capabilities in 2023.

Having gone through the incremental support improvements for SQL Server and .NET strings, my first thought was "surely MS Access can handle collations by now (2023)".

My bottom line questions are: "exactly" what encodings ("sort orders") is Office 365 Access supporting in its most recent releases, and is VBA using the same character set, or will working with accented characters in VBA experience translations or issues when being used within MS Access?

答案1

得分: 1

你给我的信息有点少,所以我只会介绍基础知识。请注意,新功能很少添加到VBA和Access中,与SQL服务器或C#的新版本相比,破坏性更改极为罕见。

关于字符集和编码(字符串存储方式):

表格、查询和应用对象中的字符串以UTF-16存储。它们可以被压缩(文本字段的Unicode压缩选项)。这与排序顺序无关。

VBA代码本身以本地字符集存储(可能不支持某些字符)。通常建议在VBA代码中避免使用非ASCII字符,因为这可能在不同计算机和字符集上引起问题。如果需要在VBA文字中使用非ASCII字符,可以查看此帖以获取一些技巧。

VBA字符串始终是BSTR,它使用UTF-16字符。

关于排序顺序/校对规则(字符串比较方式):

Access不完全支持校对规则,也没有特定的区分大小写/不区分大小写和区分重音/不区分重音的校对规则。

它支持不同的排序顺序,确定字符串应如何排序以及哪些字符相等。可以在此处找到过时的列表。在Access的对象浏览器中,可以导航到_LanguageConstants_并检查列表。在Office 365的最新版本中,有一些新选项似乎使用代码页65001(= UTF-8),但我还没有看到文档或进行实验。

在VBA中,字符串比较和排序由模块顶部的Option Compare语句确定。几乎所有VBA应用程序只支持两种:Option Compare Binary,任何不等式都是不相等的字符串,排序区分大小写;和Option Compare Text,使用本地语言设置来比较字符串。对于Access,还有第三种,Option Compare Database,使用数据库排序顺序来比较字符串。

请注意,不是所有函数都支持所有Unicode字符。支持有限的函数包括MsgBoxDebug.Print。这在使用系统代码页中不存在的字符调试代码时可能会特别困难。

进一步注意事项:

VBA允许(相对)轻松地访问Windows API。您可以使用CompareStringEx而不是自己编写字符串比较函数,该函数具有执行不区分大小写和不区分变音符比较的选项。

请注意,对于外部函数,您需要使用StrPtr传递字符串指针,将字符串作为字符串传递将自动将它们从BSTR转换为指向系统代码页中的空终止字符串的指针。请参阅此答案以获取如何调用带有Unicode字符串的WinAPI函数的基本示例。您还需要查找和声明所有常量,例如Public Const NORM_IGNORECASE As Long = &H1等。

英文:

You're not giving me a whole lot to go on, so I'll just go over the basics. It's important to note new features rarely make it to VBA and Access, and breaking changes are extremely rare, in contrast to new versions SQL server or C#.

Regarding charsets and encodings (how strings are stored):

Strings in tables, queries and application objects are stored in UTF-16. They may be compressed (unicode compression option for text fields). This is independent of sort orders.

The VBA code itself is stored in the local charset (which may not support certain characters). It's generally recommended to avoid non-ASCII characters in VBA code, as this may cause issues on different computers and different charsets. See this post for some trickery if you need non-ASCII characters in VBA literals.

VBA strings are always a BSTR which uses UTF-16 characters.

Regarding sort orders/collations (how strings are compared):

Access has no full support for collations, and no specific case sensitive/case insensitive and accent sensitive/accent insensitive collations.

It does support different sort orders, which determines how strings should be sorted and which characters are equal. An outdated list can be found here. Using the object browser in Access, you can navigate to LanguageConstants and check the list. In recent builds of Office 365, there are some new options that appear to use codepage 65001 (= UTF-8) but I haven't seen docs or experimented with it.

In VBA, string comparisons and sorts are determined by an Option Compare statement at the top of the module. Nearly all VBA applications only support two: Option Compare Binary, any inequality is an unequal string and sorts are case sensitive, and Option Compare Text, use the local language settings to compare strings. For Access, there's a third, Option Compare Database, use the database sort order to compare strings.

Note that not all functions support all unicode characters. Functions with limited support include MsgBox and Debug.Print. This can make it especially hard to debug code when working with characters not in the system code page.

Further notes

VBA does allow (relatively) easy access to the Windows API. Instead of rolling your own string comparison function, you could use CompareStringEx which has options to do case-insensitive diacritic-insensitive comparisons.

Note that for external functions, you need to pass string pointers using StrPtr, passing strings as a string will automatically convert them from a BSTR to a pointer to a null-terminated string in the system codepage. See this answer for a basic example how to call a WinAPI function for a unicode string. You will also have to look up and declare all the constants, e.g. Public Const NORM_IGNORECASE As Long = &H1, etc.

huangapple
  • 本文由 发表于 2023年2月14日 01:45:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439469.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定