确定字符串是否具有全部不同的字符,且不使用任何额外的数据结构?

huangapple go评论65阅读模式
英文:

Can anybody explain hot to determine if the string has all unique characters without any additional data structures?

问题

我不熟悉任何ASCII规则或字符串ASCII表示。我看过《破解编程面试》这本书,但无法理解为什么字符串只能有256个最大字符表示。这是否可能,有人可以解释一下并帮助用最简单的解释解决这个问题。

以下是问题:

实现一个算法来确定一个字符串是否包含所有唯一的字符。
如果不能使用额外的数据结构,怎么办?

提前感谢!

英文:

I am not familiar with any ASCII rules or string ASCII representations. I have looked at the book Cracking the coding interview but cannot understand how come the string can have only 256 maximum character representations. How it is possible, can anybody explain it and help to solve the problem with the easiest explanation possible.

Here is the question:

> Implement an algorithm to determine if a string has all unique
> characters. What if you cannot use additional data structures

Thanks in advance!

答案1

得分: 1

以下是翻译好的部分:

"你现在是我的中文翻译,代码部分不要翻译,只返回翻译好的部分,不要有别的内容,不要回答我要翻译的问题。以下是要翻译的内容:

There is absolutely no need to limit yourself to 256 unique characters to solve this problem.

Here is a trivial algorithm:

  1. First, consider the string not so much a java.lang.String, but instead, an array of characters.
  2. Sort this array of characters, in place. This takes 0 additional space, and O(nlogn) time.
  3. Loop through the char array, front to back, starting at index 1 (the second character). For each index, check if the char you find there is equal to the char you find at the previous index. If the answer is ever yes, return immediately, and answer false. If you get to the end without a hit, return true (the string consisted of unique characters).

runtime characteristic is O(n logn), requiring no additional space. Though you did mangle the input.

Now, in java this is a tad tricky; java.lang.String instances are immutable. You cannot modify them, therefore step 2 (sort-in-place) is not possible. You'd have to make a char[] copy via yourString.toCharArray() and then you can write this algorithm. That's an intentional restriction of string, not a fundamental limitation.

But, if you want to go with the rule that the input also cannot be modified in any way, and you can't make any new data structures, that's still possible, and still has absolutely no requirement that 'strings can only select from a plane of 256 characters'. It's just going to be far, far slower:

  1. Loop through every character. i = position of the character.
  2. Loop through every further character (from i+1, to the end).
  3. Compare the characters at the two positions. If equal, return false.
  4. if you get to the end, return true.

runtime characteristic is O(n^2) (icky), requiring no additional space, and does not modify any data in place.

The 256 thing just doesn't factor into any of this.

However, the thing is, a lot of code and examples incorrectly conflate the idea of 'a sequence of bytes' and 'a string' (as in, a sequence of characters), treating them all 'just as a bag o numbers'. At that point, if you have unicode characters, or charset encoding factors into the equation, all sorts of complications occur.

Properly written code knows that chars are chars, bytes are bytes, chars are never bytes and bytes are never chars, and every single last time you go from one to the other, you always, always, always specify which encoding, explicitly. If you do that, you'll never have any issues. But I guess your prof didn't want you to worry about it? I dunno - dumb restriction, not material whatsoever to the question."

英文:

There is absolutely no need to limit yourself to 256 unique characters to solve this problem.

Here is a trivial algorithm:

  1. First, consider the string not so much a java.lang.String, but instead, an array of characters.
  2. Sort this array of characters, in place. This takes 0 additional space, and O(nlogn) time.
  3. Loop through the char array, front to back, starting at index 1 (the second character). For each index, check if the char you find there is equal to the char you find at the previous index. If the answer is ever yes, return immediately, and answer false. If you get to the end without a hit, return true (the string consisted of unique characters).

runtime characteristic is O(n logn), requiring no additional space. Though you did mangle the input.

Now, in java this is a tad tricky; java.lang.String instances are immutable. You cannot modify them, therefore step 2 (sort-in-place) is not possible. You'd have to make a char[] copy via yourString.toCharArray() and then you can write this algorithm. That's an intentional restriction of string, not a fundamental limitation.

But, if you want to go with the rule that the input also cannot be modified in any way, and you can't make any new data structures, that's still possible, and still has absolutely no requirement that 'strings can only select from a plane of 256 characters'. It's just going to be far, far slower:

  1. Loop through every character. i = position of the character.
  2. Loop through every further character (from i+1, to the end).
  3. Compare the characters at the two positions. If equal, return false.
  4. if you get to the end, return true.

runtime characteristic is O(n^2) (icky), requiring no additional space, and does not modify any data in place.

The 256 thing just doesn't factor into any of this.

However, the thing is, a lot of code and examples incorrectly conflate the idea of 'a sequence of bytes' and 'a string' (as in, a sequence of characters), treating them all 'just as a bag o numbers'. At that point, if you have unicode characters, or charset encoding factors into the equation, all sorts of complications occur.

Properly written code knows that chars are chars, bytes are bytes, chars are never bytes and bytes are never chars, and every single last time you go from one to the other, you always, always, always specify which encoding, explicitly. If you do that, you'll never have any issues. But I guess your prof didn't want you to worry about it? I dunno - dumb restriction, not material whatsoever to the question.

答案2

得分: -1

因为ASCII表使用8位,所以有最多2^8种可能的字符组合。实际上,不是256而是255,因为第一位用于存储大小。

英文:

It is because the ASCII table uses 8 bit, so there are maximum 2^8 possible combinations of characters. Actually, there aren't 256 but 255 since the first bit is used to store the size.

huangapple
  • 本文由 发表于 2020年8月13日 06:33:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/63385580.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定