For a same String, will SQLite's length will ever return a different value than Java's length method?

huangapple go评论86阅读模式
英文:

For a same String, will SQLite's length will ever return a different value than Java's length method?

问题

给出相同的字符串数据

  1. SQLite对其TEXT列执行length计算。
  2. TEXT列被读入(使用Android Room数据库)Java字符串,然后Java执行String.length()

这两种情况是否有可能得到不同的值?

我已经使用英文和非英文字符进行了简单的测试。两者得到的值相同。

但是,我不确定是否有任何我忽略的边缘情况?

英文:

Give a same String data

  1. SQLite perform length calculation on its TEXT column.
  2. The TEXT column is read into (Using Android Room database) Java String, then Java performs String.length()

Is there any chance that these yields 2 different value?

I have do a rough test using English and non-English characters. Both yields the same value.

But, I am not sure whether there is any edge cases I have missed out?

答案1

得分: 7

自从你在寻找边缘情况...<br/>

来自SQLite的内置标量SQL函数

> length(X)<br/>
> 对于字符串值X,<br/>
> length(X)函数返回X中(在第一个NUL字符之前)字符(而非字节)的数量。(我加重了语气)<br/>
> 由于SQLite字符串通常不包含NUL字符,<br/>
> length(X)函数通常会返回字符串X中的字符总数....

因此,对于SQLite:

SELECT LENGTH('a' || CHAR(0) || 'b')

将返回 1,<br/>

但是对于Java:

String s = "a" + Character.toString('
String s = "a" + Character.toString('\0') + "b";
System.out.println("" + s.length());
') + "b"; System.out.println("" + s.length());

将返回 3

英文:

Since you are looking for edge cases...<br/>

From SQLite's Built-In Scalar SQL Functions:

> length(X)<br/>
> For a string value X, <br/>
> the length(X) function returns the number of characters (not bytes) in X <br/>
> prior to the first NUL character. (emphasis mine)<br/>
> Since SQLite strings do not normally contain NUL characters,<br/>
> the length(X) function will usually return the total number of characters in the string X....

So, SQLite, for:

SELECT LENGTH(&#39;a&#39; || CHAR(0) || &#39;b&#39;)

will return 1,<br/>

but Java, for:

String s = &quot;a&quot; + Character.toString(&#39;
String s = &quot;a&quot; + Character.toString(&#39;\0&#39;) + &quot;b&quot;;
System.out.println(&quot;&quot; + s.length());
&#39;) + &quot;b&quot;; System.out.println(&quot;&quot; + s.length());

will return 3.

答案2

得分: 5

可能会有一些情况导致长度不同,Java在内部字符串表示中使用UTF-16,因此某些类型的字符将需要使用代理对来存储在内存中。Java的String.length()不考虑这一点。

一个使用🤩表情字符的简单示例:

class HelloWorld {
    public static void main(String[] args) {
        System.out.println("🤩".length());
    }
}

这将输出2。

另一方面,sqlite的文档说明如下:

对于字符串值X,length(X)函数返回第一个NUL字符之前X中的字符数(而不是字节数)。

它指明它计算的是字符数

sqlite> select length('🤩');

这将返回1。

这不仅限于“表情符号”,对于一些具有“高”代码点的字符,如一些亚洲字符,情况也是一样的。

在sqlite 3.28.0和openjdk版本“1.8.0_252”上进行了测试。我认为对于您的堆栈也应该是成立的。

英文:

There could be some cases where the length differ, Java uses UTF-16 for internal string representation, so some kind of characters will need a surrogate pair to be stored in memory. Java's String.length() does not take into account this.

A simple example using the 💩 emoji character

    class HelloWorld {
    public static void main(String[] args) {
        System.out.println(&quot;&#128169;&quot;.length());
    }}

This will print 2.

On the other hand the documentation of sqlite states:

> For a string value X, the length(X) function returns the number of characters (not bytes) in X prior to the first NUL character.

It specifies that it counts the characters

sqlite&gt; select length(&#39;&#128169;&#39;); 

this will return 1.

This is not exclusive to "emojis" it will be the same also for some languages that have characters with "high" codepoints like some Asian characters

tested with sqlite 3.28.0 and openjdk version "1.8.0_252". I think it should hold true for your stack.

答案3

得分: 2

根据文档,在Sqlite的TEXT字段中,NUL字符(ASCII 0x00,Unicode \u0000)可能导致不同的length值。

以文本Hello\u0000World为例,

Sqlite 将返回长度为16,

Java 将返回长度为11,

Java会将NUL字符视为1,而Sqlite会视为6。相同的文本将具有不同的值。

英文:

Acording to the documentation of Sqlite NUL characters (ASCII 0x00, Unicode \u0000) in the TEXT field can lead to different length values.

Taking for example the text Hello\u0000World

Sqlite will return a length of 16

Java will return a length of 11

Java will count the NUL character as 1 while Sqlite will count 6. The same text will have different values.

huangapple
  • 本文由 发表于 2020年10月28日 00:02:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/64558317.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定