英文:
Ambiguity in a CodeForces Problem - usage of HashSet Vs LinkedHashSet
问题
昨天我在解决Codeforces的问题。问题的链接是this
我将在下面简要解释问题。
给定一个二进制字符串,将其分成尽可能少的子序列,以使字符串的每个字符都属于恰好一个子序列,且每个子序列的形式为"010101..."或"101010..."(即子序列不应包含两个相邻的零或一)。
对于这个问题,昨天在比赛中我提交了一个解决方案。这是solution。它被临时接受,但在最终的测试中得到了超时的状态。
所以今天,我又提交了另一个solution,这次通过了所有的测试用例。
在第一个解决方案中,我使用了HashSet
,而在第二个解决方案中我使用了LinkedHashSet
。我想知道,为什么HashSet
没有通过所有测试用例?这是否意味着每当我需要一个Set
实现时都应该使用LinkedHashSet
?我看到了this文章,并发现HashSet
的性能比LinkedHashSet
好。但为什么我的代码在这里不起作用?
英文:
I was solving a Codeforces problem yesterday. The problem's URL is this
I will just explain the question in short below.
> Given a binary string, divide it into a minimum number of subsequences
> in such a way that each character of the string belongs to exactly one
> subsequence and each subsequence looks like "010101 ..." or "101010
> ..." (i.e. the subsequence should not contain two adjacent zeros or
> ones).
Now, for this problem, I had submitted a solution yesterday during the contest. This is the solution. It got accepted temporarily and on final test cases got a Time limit exceeded status.
So today, I again submitted another solution and this passed all the cases.
In the first solution, I used HashSet
and in the 2nd one I used LinkedHashSet
. I want to know, why didn't HashSet
clear all the cases? Does this mean I should use LinkedHashSet
whenever I need a Set
implementation? I saw this article and found HashSet
performs better than LinkedHashSet
. But why my code doesn't work here?
答案1
得分: 2
这个问题可能会在Codeforces上得到更多回复,但我还是会在这里回答。
比赛结束后,Codeforces允许其他用户通过编写自定义输入来运行其他用户的程序来“攻击”解决方案。如果被攻击的用户的程序在自定义输入上运行缓慢,他们的代码提交状态将从“已接受”更改为“超时”。
你的代码之所以从“已接受”更改为“超时”,是因为有人创建了一个“反哈希测试”(一个测试,在这个测试中你的哈希函数导致许多碰撞),在这个测试上你的程序比平常运行得慢。如果你对如何生成这样的测试感兴趣,可以在Codeforces上找到一些帖子,比如这个:https://codeforces.com/blog/entry/60442。
正如@Photon提到的那样,Codeforces上有一篇关于为什么应避免使用Java.HashSet和Java.HashMap的帖子:https://codeforces.com/blog/entry/4876,这主要是因为反哈希测试。在某些情况下,从平衡BST中添加额外的log(n)
因子可能并不太糟糕(通过使用TreeSet
或TreeMap
)。在许多情况下,额外的log(n)
因子不会导致你的代码超时,并且它可以保护你免受反哈希测试的影响。
你如何确定你的算法是否足够快以添加log(n)
因子?我猜这需要一些经验,但大多数人建议进行某种计算。大多数在线评测系统(包括Codeforces)显示你的程序被允许在特定问题上运行的时间(通常在一秒到四秒之间),你可以使用每秒10^9
个常数时间操作作为一个经验法则来进行计算。
英文:
This question would probably get more replies on Codeforces, but I'll answer it here anyways.
After a contest ends, Codeforces allows other users to "hack" solutions by writing custom inputs to run on other users' programs. If the defending user's program runs slowly on the custom input, the status of their code submission will change from "Accepted" to "Time Limit Exceeded".
The reason why your code, specifically, changed from "Accepted" to "Time Limit Exceeded" is that somebody created an "anti-hash test" (a test on which your hash function results in many collisions) on which your program ran slower than usual. If you're interested in how such tests are generated, you can find several posts on Codeforces, like this one: https://codeforces.com/blog/entry/60442.
As linked by @Photon, there's a post on Codeforces explaining why you should avoid using Java.HashSet and Java.HashMap: https://codeforces.com/blog/entry/4876, which is essentially due to anti-hash tests. In some instances, adding the extra log(n)
factor from a balanced BST may not be so bad (by using TreeSet
or TreeMap
). In many cases, an extra log(n)
factor won't make your code time out, and it gives you protection from anti-hash tests.
How do you determine whether your algorithm is fast enough to add the log(n)
factor? I guess this comes with some experience, but most people suggest performing some sort of calculation. Most online judges (including Codeforces) show the time that your program is allowed to run on a particular problem (usually somewhere between one and four seconds), and you can use 10^9
constant-time operations per second as a rule of thumb when performing calculations.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论