英文:
Efficiently generate triplet combinations
问题
给定一个三元组集合 S,对于每个三元组 s \in S,都满足 s[1] >= s[2] >= s[3],其中 s[i] 表示三元组 s 的第 i 个元素。对于任意 s、t 和 v \in S,定义函数 F(s,t,v) 生成一个新的三元组:F(s,t,v)=(max{s[1],t[1],v[1]} ,max{s[2],t[2],v[2]}, max{s[3],t[3],v[3]})。目标:高效生成集合 T={F(s,t,v) | s,t,v \in S}。
两个示例:
S = [(9,4,3),(8,6,2),(6,6,4)]
T = [(9,4,3),(8,6,2),(6,6,4),(9,6,3),(9,6,4),(8,6,4)]
S = [(9,4,3),(8,6,2),(6,5,4)]
T = [(9,4,3), (9,6,3), b(9,5,4), b(9,6,4), b(8,6,2), b(8,6,4), b(6,5,4)]
下面是一个简单但相对低效的实现,可以实现上述目标。此代码的运行时间为 O(n^3),其中 |S|=n。问题是:如何更高效地实现?这将涉及创建一个高效的数据结构,用于存储排序后的 S。例如,我们可以观察到 F(s,t,v)=s,如果 t[1]、v[1] <= s[1],t[2]、v[2] <= s[2],t[3]、v[3] <= s[3]。因此,如果我们选择三元组 s=(x,y,z),那么我们只需要迭代具有 x' <= x 且 y' >= y 且 z' >= z 的三元组 (x',y',z')。
注意:在我的应用中,|S| 很大,例如 100000 个三元组。
```java
public class TripleGen {
public static void main(String[] args) {
int[][] ds = new int[][]{{9, 4, 3}, {8, 6, 2}, {6, 5, 4}};
List<Triple> l = Triple.toList(ds);
System.out.println(gen(l));
}
public static Set<Triple> gen(List<Triple> S) {
Set<Triple> T = new HashSet<>();
for (int i = 0; i < S.size(); i++) {
for (int j = i; j < S.size(); j++) {
for (int k = j; k < S.size(); k++) {
int l = Math.max(S.get(i).x, Math.max(S.get(j).x, S.get(k).x));
int w = Math.max(S.get(i).y, Math.max(S.get(j).y, S.get(k).y));
int h = Math.max(S.get(i).z, Math.max(S.get(j).z, S.get(k).z));
T.add(new Triple(l, w, h));
}
}
}
return T;
}
}
public final class Triple {
public final int x;
public final int y;
public final int z;
public Triple(int x, int y, int z) {
this.x = x;
this.y = y;
this.z = z;
}
public static List<Triple> toList(int[][] ds) {
List<Triple> l = new ArrayList<>(ds.length);
for (int[] d : ds)
l.add(new Triple(d[0], d[1], d[2]));
return l;
}
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Triple t = (Triple) o;
return x == t.x &&
y == t.y &&
z == t.z;
}
public int hashCode() {
return Objects.hash(x, y, z);
}
public String toString() {
return "(" + x + "," + y + "," + z + ")";
}
}
请注意,上述代码是原始代码的翻译,其中可能包含了一些代码格式或注释的变化。
英文:
Given a set of triplets S, where for every triplet s \in S it holds that s[1] >= s[2] >= s[3], where s[i] is the ith element of triplet s. For any s,t,v \in S, let function F(s,t,v) generate a new triplet: F(s,t,v)=(max{s[1],t[1],v[1]} ,max{s[2],t[2],v[2]}, max{s[3],t[3],v[3]}). Goal: generate set T={F(s,t,v) | s,t,v \in S} efficiently.
Two examples:
S = [(9,4,3),(8,6,2),(6,6,4)]
T = [(9,4,3),(8,6,2),(6,6,4),(9,6,3),(9,6,4),(8,6,4)]
S = [(9,4,3),(8,6,2),(6,5,4)]
T = [(9,4,3), (9,6,3), b(9,5,4), b(9,6,4), b(8,6,2), b(8,6,4), b(6,5,4)]
Below is a simple, but relatively inefficient implementation that accomplishes the above. This code runs in O(n^3) with |S|=n. The question is: how to implement this more efficiently? This would involve coming up with an efficient data structure that holds a sorted version of S. For instance, we can observe that F(s,t,v)=s if t[1],v[1] <= s[1], t[2],v[2] <= s[2], t[3],v[3] <= s[3]. So if we pick triple s=(x,y,z), then we only need to iterate over triples (x',y',z') having x' <= x and y' >= y and z' >= z.
Note: in my application |S| is large, e.g. 100000 triples.
public class TripleGen {
public static void main(String[] args) {
int[][] ds = new int[][]{{9, 4, 3}, {8, 6, 2}, {6, 5, 4}};
List<Triple> l = Triple.toList(ds);
System.out.println(gen(l));
}
public static Set<Tripple> gen(List<Triple> S) {
Set<Triple> T = new HashSet<>();
for (int i = 0; i < S.size(); i++) {
for (int j = i; j < S.size(); j++) {
for (int k = j; k < S.size(); k++) {
int l = Math.max(S.get(i).x, Math.max(S.get(j).x, S.get(k).x));
int w = Math.max(S.get(i).y, Math.max(S.get(j).y, S.get(k).y));
int h = Math.max(S.get(i).z, Math.max(S.get(j).z, S.get(k).z));
T.add(new Triple(l, w, h));
}
}
}
return T;
}
}
public final class Triple {
public final int x;
public final int y;
public final int z;
public Triple(int x, int y, int z) {
this.x = x;
this.y = y;
this.z = z;
}
public static List<Triple> toList(int[][] ds) {
List<Triple> l = new ArrayList<>(ds.length);
for (int[] d : ds)
l.add(new Triple(d[0], d[1], d[2]));
return l;
}
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Triple t = (Triple) o;
return x == t.x &&
y == t.y &&
z == t.z;
}
public int hashCode() {
return Objects.hash(x, y, z);
}
public String toString() {
return "(" + x + "," + y + "," + z+")";
}
}
答案1
得分: 1
我怀疑收益不大。我呈现了我的尝试。
- 考虑函数 F2(s, t),它对仅两个三元组进行了类似的组合。现在可以将 F(s, t, v) 写成 F2(s, F2(t, v)) 的形式,这样在计算时可以重用 F2(t, v) 的结果,以便为不同的 s 计算。
- 可以通过估计结果
HashSet
的容量来稍微改进,以便在进行中不需要扩展和重新哈希。
在代码中:
public static Set<Triple> gen(List<Triple> s) {
// 对 s 进行去重
s = new ArrayList<>(new HashSet<>(s));
int n = s.size();
// 首先组合成对的三元组
int maxSizeOfT2 = (n * n - 1) / 2;
int capacityForT2 = (maxSizeOfT2 * 4 + 2) / 3;
Set<Triple> t2AsSet = new HashSet<>(capacityForT2);
// 仅对不同的三元组进行配对
for (int i = 0; i < s.size(); i++) {
for (int j = i + 1; j < s.size(); j++) {
Triple newTriplet = f2(s.get(i), s.get(j));
t2AsSet.add(newTriplet);
}
}
List<Triple> t2 = new ArrayList<>(t2AsSet);
// 对三个原始三元组的组合
// 将每个对与每个原始三元组组合
int maxSizeOfT = (t2AsSet.size() + 1) * (n + 1) - 1;
int capacityForT = (maxSizeOfT * 4 + 2) / 3;
Set<Triple> t = new HashSet<>(capacityForT);
for (int i = 0; i < t2.size(); i++) {
for (int j = 0; j < s.size(); j++) {
Triple newTriplet = f2(t2.get(i), s.get(j));
t.add(newTriplet);
}
}
// 不生成 F(s, s, s),只将每个 s 添加到结果中
t.addAll(s);
return t;
}
我没有进行任何基准测试,只是进行了一些初步的时间测量。它们并不令人鼓舞。我在变化输入中的三元组数量以及三元组中数字的范围。当只有小数字时,会过滤掉许多重复项,结果集会更小。具有较大数字范围时,冲突很少发生,结果集的大小更大。
列表大小 元素范围 结果大小 您的时间(毫秒) 我的时间(毫秒) 改进百分比
--------------------------------------------------------------
3 1–9 6 0.038 0.015 60
3 1–10,000 7 0.046 0.016 66
400 1–9 159 4736 4740 0
400 1–10,000 858,897 1079 1067 1
正如您在评论中预期的那样,最佳情况可能会得到改进,数字可能表明这是正确的。对于最坏情况,似乎只有微小的改进。
正如我在评论中所说,结果集的大小是 O(n^3),因此生成它的任何算法都不会比 O(n^3) 更快。我们可以期望的是 n^3 上的较小常数因子。
英文:
I doubt that there’s much to be gained. I present my attempt.
- Consider function F2(s, t) that makes a similar combination of just two triplets. Now F(s, t, v) can be written as F2(s, F2(t, v)), and there may be a performance gain in calculating it in this way reusing the result of F2(t, v) for different s's.
- A slight improvement may be made by estimating the capacity of the result
HashSet
so no extensions and rehashing will be needed underway.
In code:
public static Set<Triple> gen(List<Triple> s) {
// Deduplicate s
s = new ArrayList<>(new HashSet<>(s));
int n = s.size();
// Combine pairs of triplets first
int maxSizeOfT2 = (n * n - 1) / 2;
int capacityForT2 = (maxSizeOfT2 * 4 + 2) / 3;
Set<Triple> t2AsSet = new HashSet<>(capacityForT2);
// For the pairs only pair two *different* triples
for (int i = 0; i < s.size(); i++) {
for (int j = i + 1; j < s.size(); j++) {
Triple newTriplet = f2(s.get(i), s.get(j));
t2AsSet.add(newTriplet);
}
}
List<Triple> t2 = new ArrayList<>(t2AsSet);
// For the combinations of three original triplets
// combine every pair with ever original triplet
int maxSizeOfT = (t2AsSet.size() + 1) * (n + 1) - 1;
int capacityForT = (maxSizeOfT * 4 + 2) / 3;
Set<Triple> t = new HashSet<>(capacityForT);
for (int i = 0; i < t2.size(); i++) {
for (int j = 0; j < s.size(); j++) {
Triple newTriplet = f2(t2.get(i), s.get(j));
t.add(newTriplet);
}
}
// Instead of generating F(s, s, s) just add every s to the result
t.addAll(s);
return t;
}
I didn’t make any benchmarking, just some preliminary time measurements. They are not promising. I am varying the number of triplets in the input, and also the range of the numbers in the triplets. When there only small numbers, many duplicates will be filtered out and the result set will be smaller. With a larger range of numbers, clashes happen seldom, and the size of the result set is bigger.
List Element Result Your time My time Improvement
size range size milliseconds milliseconds %
-----------------------------------------------------------------
3 1–9 6 0.038 0.015 60
3 1–10 000 7 0.046 0.016 66
400 1–9 159 4736 4740 0
400 1–10 000 858 897 1079 1067 1
In the comments you expected that the best case could be improved, and numbers may indicate that that is true. For the worst case there only seems to be a marginal improvement.
As I said in the comments, the size of the result set is O(n^3), so no algorithm to generate it could be faster than O(n^3). What we might hope for would be a smaller constant factor on the n^3.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论