2020年4月6日 00:11:41go评论85阅读模式

英文:

Increase Speed of Sorting Large List of Strings by Character in Java

问题

以下是您提供的代码翻译结果：

我已经编写了一段代码，用于遍历字符串列表，并返回整个字符串列表中找到的最不频繁的字符的列表。

要以包含最不频繁字符的单词优先的方式返回排序后的单词列表，最快的方法是什么？（我正在处理一个庞大的字符串列表，所以我编写的代码运行得不够快。下面的示例只是为了说明）

例如，如果给定的列表是：```[&quot;hello&quot;, &quot;my&quot;, &quot;name&quot;, &quot;is&quot;, &quot;inigo&quot;, &quot;montoya&quot;, &quot;you&quot;, &quot;killed&quot;, &quot;my&quot;, &quot;father&quot;, &quot;prepare&quot;, &quot;to&quot;, &quot;die&quot;]```。我的代码返回：```[s, g, u, k, f, h, d, p, n, t, r, l, m, y, a, i, o, e]```，其中```s```是在字符串列表中找到的最不频繁的字母，```e```是在列表中找到的最常见的字母。然后，结果列表将首先返回包含最不频繁字母的单词，依此类推。例如：```[&quot;is&quot;, &quot;inigo&quot;, &quot;you&quot;, &quot;killed&quot;, &quot;father&quot;, &quot;hello&quot;, &quot;die&quot;, &quot;prepare&quot;, &quot;name&quot;, &quot;montoya&quot;, &quot;to&quot;, &quot;my&quot;]```

以下是找到最不频繁字母的代码：
```java
public static void method(List<String> words)
{
    Map<Character, Integer> elemCount = new LinkedHashMap<>();
    for (String word : words)
    {
        for (int i = 0; i < word.length(); i++)
        {
            if (elemCount.containsKey(word.charAt(i)))
            {
                elemCount.put(word.charAt(i), elemCount.get(word.charAt(i)) + 1);
            }
            else
            {
                elemCount.put(word.charAt(i), 1);
            }
        }
    }
    ArrayList<Character> sortedElems = new ArrayList<>();
    LinkedList<String> sorted = new LinkedList<>();
    elemCount.entrySet().stream().sorted(
    Map.Entry.comparingByValue()).forEach(entry -> 
    { 
        for (int i = 1; i <= entry.getValue(); i++)
        {
            if (sortedElems.contains(entry.getKey()) == false)
            {
                sortedElems.add(entry.getKey());
            }
        }
    }
    );

以下是按最不频繁字符排序的代码部分：

    for (int i = 0; i < sortedElems.size(); i++)
    {
        for (String word : words)
        {
            char x = sortedElems.get(i);
            CharSequence c = x + "";
            if (word.contains(c) == true && sorted.contains(word) == false)
            {
                sorted.add(word);
            }
        }
    }
    System.out.println(sorted);
}

请注意，由于您要求只返回翻译的部分，我已经省略了代码的解释性文本。如果您对翻译的任何部分有疑问，请随时提问。

英文:

I have written a code to runs through a list of strings and returns a list of the least frequent characters found throughout the whole list of strings.

What would be the fastest way to return the list of the words sorted by those that contain the least frequent characters first? (I'm working with a huge list of strings, so the code I've written isn't running fast enough. The example below is just for example)

For example, if the given list is: ["hello", "my", "name", "is", "inigo", "montoya", "you", "killed", "my", "father", "prepare", "to", "die"]. My code, returns : [s, g, u, k, f, h, d, p, n, t, r, l, m, y, a, i, o, e], where s is the least frequent letter found in the list of strings, and e is the most frequent letter found in the list. The resulting list would then return the words containing the least frequent letter first and so on. For example: ["is", "inigo", "you", "killed", "father", "hello", "die", "prepare", "name", "montoya", "to", "my"]

Here is my code that finds the least frequent letters:

    public static void method(List&lt;String&gt; words)
{
Map&lt;Character, Integer&gt; elemCount = new LinkedHashMap&lt;&gt;();
for (String word : words)
{
for (int i = 0; i  &lt; word.length(); i++)
{
if (elemCount.containsKey(word.charAt(i)))
{
elemCount.put(word.charAt(i), elemCount.get(word.charAt(i)) + 1);
}
else
{
elemCount.put(word.charAt(i), 1);
}
}
}
ArrayList&lt;Character&gt; sortedElems = new ArrayList&lt;&gt;();
LinkedList&lt;String&gt; sorted = new LinkedList&lt;&gt;();
elemCount.entrySet().stream().sorted(
Map.Entry.comparingByValue()).forEach(entry -&gt; 
{ 
for (int i = 1; i &lt;= entry.getValue(); i++)
{
if (sortedElems.contains(entry.getKey()) == false)
{
sortedElems.add(entry.getKey());
}
}
}
);

And here is the code where I'm trying to sort by least frequent characters found in the string:

        for (int i = 0; i &lt; sortedElems.size(); i++)
{
for (String word : words)
{
char x = sortedElems.get(i);
CharSequence c = x + &quot;&quot;;
if (word.contains(c) == true &amp;&amp; sorted.contains(word) == false)
{
sorted.add(word);
}
}
}
System.out.println(sorted);

答案1

得分: 1

这似乎运行得相当不错。我放弃了流方法，因为它太慢了。

		List<String> words = new ArrayList<>(List.of("hello", "my",
				"name", "is", "inigo", "montoya", "you", "killed",
				"my", "father", "prepare", "to", "die"));

        // 频率计数。
		int[] chars = new int[256];
		for (String w : words) {
			for (char c : w.toCharArray()) {
				chars[c]++;
			}
		}
		// 找到最少使用的字符
		Map<String, Integer> mins = new HashMap<>();
		for (String w : words) {
			if (!mins.containsKey(w)) {
				int v = Integer.MAX_VALUE;
				for (char c : w.toCharArray()) {
					if (chars[c] < v) {
						v = chars[c];
					}
				}
				mins.put(w, v);
			}
		}
		
		Comparator<String> comp1 =
				(String a, String b) -> a.compareTo(b);
		Comparator<String> comp =
				(a, b) -> mins.get(a).compareTo(mins.get(b));
		comp = comp.thenComparing(comp1);
		
        // 首先按字符排序，然后按词法排序
		words.sort(comp);
	    words.forEach(System.out::println);

这是频率计数结果：

1=[f, g, k, s, u]
2=[d, h, p]
3=[n, r, t]
4=[a, l, m, y]
5=[i]
6=[o]
7=[e]

以及按排序顺序的单词：

father
inigo
is
killed
you
die
hello
prepare
montoya
name
to
my
my

英文:

This seems to work pretty well. I abandoned the streams approach as it was too slow.

		List&lt;String&gt; words = new ArrayList&lt;&gt;(List.of(&quot;hello&quot;, &quot;my&quot;,
&quot;name&quot;, &quot;is&quot;, &quot;inigo&quot;, &quot;montoya&quot;, &quot;you&quot;, &quot;killed&quot;,
&quot;my&quot;, &quot;father&quot;, &quot;prepare&quot;, &quot;to&quot;, &quot;die&quot;));
// frequency count.
int[] chars = new int[256];
for (String w : words) {
for (char c : w.toCharArray()) {
chars[c]++;
}
}
// find minimum used character
Map&lt;String, Integer&gt; mins = new HashMap&lt;&gt;();
for (String w : words) {
if (!mins.containsKey(w)) {
int v = Integer.MAX_VALUE;
for (char c : w.toCharArray()) {
if (chars[c] &lt; v) {
v = chars[c];
}
}
mins.put(w, v);
}
}
Comparator&lt;String&gt; comp1 =
(String a, String b) -&gt; a.compareTo(b);
Comparator&lt;String&gt; comp =
(a, b) -&gt; mins.get(a).compareTo(mins.get(b));
comp = comp.thenComparing(comp1);
// sort first on character then lexically
words.sort(comp);
words.forEach(System.out::println);

Here is the frequency count

1=[f, g, k, s, u]
2=[d, h, p]
3=[n, r, t]
4=[a, l, m, y]
5=[i]
6=[o]
7=[e]

And the words in sorted order

father
inigo
is
killed
you
die
hello
prepare
montoya
name
to
my
my
</details>
# 答案2
**得分**: 0
以下是翻译好的部分：
这些是我考虑到的，无法放在评论中的部分。
根据您的描述，算法的一般流程是扫描数组以生成直方图，然后使用直方图信息对数组进行排序的两步过程。
第一步是不可避免的，除非数据直接从这些词的源头以流的形式到达。在这种情况下，您可以在新单词到达时简单地更新直方图，几乎没有任何额外开销。
无论如何，这是比较简单的部分。
对于比较困难的部分，我们可以扫描数组，并将每个单词（或指向每个单词的指针）插入到B树中，其分割机制使用实时计算的分数，该分数使用一个简单的函数根据当前单词和直方图的参数计算得出。最后，扫描B树，提取带有所有单词的最终数组，排列整齐。
在非流式场景中，计算复杂度将接近于 O(n+nlog(n)+n)，但是如果对原始数组使用流式处理，并且您满意于将B树作为最终产品，而无需构建显式的最终数组，则计算复杂度最多为 O(nlog(n))。
<details>
<summary>英文:</summary>
These are my considerations that could not fit in a comment.
From your description, the general flow of the algorithm is the two-step process of scanning the array to generate the histogram and somehow sorting the array using histogram info.
The first step is unavoidable and cannot be further reduced unless the data arrives as stream directly from the source of these words. In that case you can simply update the histogram with barely any overhead as new words arrive.
Regardless, this was the easy part.
For the hard part, we can scan the array and insert each word (or a pointer to each word) into a b-tree whose splitting mechanism uses a Score computed on-the-fly using a simple function that takes as parameters the current word and the histogram.&lt;br&gt;
Finally, scan the b-tree to extract the final array with all the words neatly sorted.
In a non-streaming scenario, the computational complexity would be close to &lt;code&gt;O(n+nlog(n)+n)&lt;/code&gt;, however you can arrive at no more than &lt;code&gt;O(nlog(n))&lt;/code&gt; if streaming is used for the original array AND you are satisfied with having a b-tree as your final product without needing to build an explicit final array.
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

提高在Java中按字符对大型字符串列表进行排序的速度

问题

答案1

使用 math.random 方法来创建掷骰子方法。

一个BufferedReader如何与Socket进行交互？关于Socket的工作原理。

如何高效实现抽象类的多个方法？

为什么 (x – y <= j - k - 1) 的计算结果为假？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论