高效生成三元组组合

huangapple go评论70阅读模式
英文:

Efficiently generate triplet combinations

问题

给定一个三元组集合 S对于每个三元组 s \in S都满足 s[1] >= s[2] >= s[3]其中 s[i] 表示三元组 s 的第 i 个元素对于任意 st 和 v \in S定义函数 F(s,t,v) 生成一个新的三元组F(s,t,v)=(max{s[1],t[1],v[1]} ,max{s[2],t[2],v[2]}, max{s[3],t[3],v[3]})目标高效生成集合 T={F(s,t,v) | s,t,v \in S}

两个示例

    S = [(9,4,3),(8,6,2),(6,6,4)]
    T = [(9,4,3),(8,6,2),(6,6,4),(9,6,3),(9,6,4),(8,6,4)]
    
    S = [(9,4,3),(8,6,2),(6,5,4)]
    T = [(9,4,3), (9,6,3), b(9,5,4), b(9,6,4), b(8,6,2), b(8,6,4), b(6,5,4)]

下面是一个简单但相对低效的实现可以实现上述目标此代码的运行时间为 O(n^3)其中 |S|=n问题是如何更高效地实现这将涉及创建一个高效的数据结构用于存储排序后的 S例如我们可以观察到 F(s,t,v)=s如果 t[1]v[1] <= s[1]t[2]v[2] <= s[2]t[3]v[3] <= s[3]因此如果我们选择三元组 s=(x,y,z)那么我们只需要迭代具有 x' <= x 且 y' >= y 且 z' >= z 的三元组 (x',y',z')

注意在我的应用中|S| 很大例如 100000 个三元组

```java
public class TripleGen {
    public static void main(String[] args) {
        int[][] ds = new int[][]{{9, 4, 3}, {8, 6, 2}, {6, 5, 4}};
        List<Triple> l = Triple.toList(ds);
        System.out.println(gen(l));
    }

    public static Set<Triple> gen(List<Triple> S) {
        Set<Triple> T = new HashSet<>();
        for (int i = 0; i < S.size(); i++) {
            for (int j = i; j < S.size(); j++) {
                for (int k = j; k < S.size(); k++) {
                    int l = Math.max(S.get(i).x, Math.max(S.get(j).x, S.get(k).x));
                    int w = Math.max(S.get(i).y, Math.max(S.get(j).y, S.get(k).y));
                    int h = Math.max(S.get(i).z, Math.max(S.get(j).z, S.get(k).z));
                    T.add(new Triple(l, w, h));
                }
            }
        }

        return T;
    }
}

public final class Triple {
    public final int x;
    public final int y;
    public final int z;

    public Triple(int x, int y, int z) {
        this.x = x;
        this.y = y;
        this.z = z;
    }

    public static List<Triple> toList(int[][] ds) {
        List<Triple> l = new ArrayList<>(ds.length);
        for (int[] d : ds)
            l.add(new Triple(d[0], d[1], d[2]));
        return l;
    }

    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Triple t = (Triple) o;
        return x == t.x &&
                y == t.y &&
                z == t.z;
    }

    public int hashCode() {
        return Objects.hash(x, y, z);
    }

    public String toString() {
        return "(" + x + "," + y + "," + z + ")";
    }
}

请注意,上述代码是原始代码的翻译,其中可能包含了一些代码格式或注释的变化。

英文:

Given a set of triplets S, where for every triplet s \in S it holds that s[1] >= s[2] >= s[3], where s[i] is the ith element of triplet s. For any s,t,v \in S, let function F(s,t,v) generate a new triplet: F(s,t,v)=(max{s[1],t[1],v[1]} ,max{s[2],t[2],v[2]}, max{s[3],t[3],v[3]}). Goal: generate set T={F(s,t,v) | s,t,v \in S} efficiently.

Two examples:

S = [(9,4,3),(8,6,2),(6,6,4)]
T = [(9,4,3),(8,6,2),(6,6,4),(9,6,3),(9,6,4),(8,6,4)]
S = [(9,4,3),(8,6,2),(6,5,4)]
T = [(9,4,3), (9,6,3), b(9,5,4), b(9,6,4), b(8,6,2), b(8,6,4), b(6,5,4)]

Below is a simple, but relatively inefficient implementation that accomplishes the above. This code runs in O(n^3) with |S|=n. The question is: how to implement this more efficiently? This would involve coming up with an efficient data structure that holds a sorted version of S. For instance, we can observe that F(s,t,v)=s if t[1],v[1] <= s[1], t[2],v[2] <= s[2], t[3],v[3] <= s[3]. So if we pick triple s=(x,y,z), then we only need to iterate over triples (x',y',z') having x' <= x and y' >= y and z' >= z.
Note: in my application |S| is large, e.g. 100000 triples.

public class TripleGen {
public static void main(String[] args) {
int[][] ds = new int[][]{{9, 4, 3}, {8, 6, 2}, {6, 5, 4}};
List&lt;Triple&gt; l = Triple.toList(ds);
System.out.println(gen(l));
}
public static Set&lt;Tripple&gt; gen(List&lt;Triple&gt; S) {
Set&lt;Triple&gt; T = new HashSet&lt;&gt;();
for (int i = 0; i &lt; S.size(); i++) {
for (int j = i; j &lt; S.size(); j++) {
for (int k = j; k &lt; S.size(); k++) {
int l = Math.max(S.get(i).x, Math.max(S.get(j).x, S.get(k).x));
int w = Math.max(S.get(i).y, Math.max(S.get(j).y, S.get(k).y));
int h = Math.max(S.get(i).z, Math.max(S.get(j).z, S.get(k).z));
T.add(new Triple(l, w, h));
}
}
}
return T;
}
}
public final class Triple {
public final int x;
public final int y;
public final int z;
public Triple(int x, int y, int z) {
this.x = x;
this.y = y;
this.z = z;
}
public static List&lt;Triple&gt; toList(int[][] ds) {
List&lt;Triple&gt; l = new ArrayList&lt;&gt;(ds.length);
for (int[] d : ds)
l.add(new Triple(d[0], d[1], d[2]));
return l;
}
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Triple t = (Triple) o;
return x == t.x &amp;&amp;
y == t.y &amp;&amp;
z == t.z;
}
public int hashCode() {
return Objects.hash(x, y, z);
}
public String toString() {
return &quot;(&quot; + x + &quot;,&quot; + y + &quot;,&quot; + z+&quot;)&quot;;
}
}

答案1

得分: 1

我怀疑收益不大。我呈现了我的尝试。

  • 考虑函数 F2(s, t),它对仅两个三元组进行了类似的组合。现在可以将 F(s, t, v) 写成 F2(s, F2(t, v)) 的形式,这样在计算时可以重用 F2(t, v) 的结果,以便为不同的 s 计算。
  • 可以通过估计结果 HashSet 的容量来稍微改进,以便在进行中不需要扩展和重新哈希。

在代码中:

public static Set<Triple> gen(List<Triple> s) {
    // 对 s 进行去重
    s = new ArrayList<>(new HashSet<>(s));
    
    int n = s.size();
    
    // 首先组合成对的三元组
    int maxSizeOfT2 = (n * n - 1) / 2;
    int capacityForT2 = (maxSizeOfT2 * 4 + 2) / 3;
    Set<Triple> t2AsSet = new HashSet<>(capacityForT2);
    // 仅对不同的三元组进行配对
    for (int i = 0; i < s.size(); i++) {
        for (int j = i + 1; j < s.size(); j++) {
            Triple newTriplet = f2(s.get(i), s.get(j));
            t2AsSet.add(newTriplet);
        }
    }
    List<Triple> t2 = new ArrayList<>(t2AsSet);
    
    // 对三个原始三元组的组合
    // 将每个对与每个原始三元组组合
    int maxSizeOfT = (t2AsSet.size() + 1) * (n + 1) - 1;
    int capacityForT = (maxSizeOfT * 4 + 2) / 3;
    Set<Triple> t = new HashSet<>(capacityForT);
    for (int i = 0; i < t2.size(); i++) {
        for (int j = 0; j < s.size(); j++) {
            Triple newTriplet = f2(t2.get(i), s.get(j));
            t.add(newTriplet);
        }
    }
    
    // 不生成 F(s, s, s),只将每个 s 添加到结果中
    t.addAll(s);
    
    return t;
}

我没有进行任何基准测试,只是进行了一些初步的时间测量。它们并不令人鼓舞。我在变化输入中的三元组数量以及三元组中数字的范围。当只有小数字时,会过滤掉许多重复项,结果集会更小。具有较大数字范围时,冲突很少发生,结果集的大小更大。

列表大小   元素范围    结果大小   您的时间(毫秒)  我的时间(毫秒)  改进百分比
--------------------------------------------------------------
   3       1–9           6          0.038          0.015         60
   3       1–10,000      7          0.046          0.016         66
 400      1–9          159        4736          4740              0
 400      1–10,000  858,897     1079          1067              1

正如您在评论中预期的那样,最佳情况可能会得到改进,数字可能表明这是正确的。对于最坏情况,似乎只有微小的改进。

正如我在评论中所说,结果集的大小是 O(n^3),因此生成它的任何算法都不会比 O(n^3) 更快。我们可以期望的是 n^3 上的较小常数因子。

英文:

I doubt that there’s much to be gained. I present my attempt.

  • Consider function F2(s, t) that makes a similar combination of just two triplets. Now F(s, t, v) can be written as F2(s, F2(t, v)), and there may be a performance gain in calculating it in this way reusing the result of F2(t, v) for different s's.
  • A slight improvement may be made by estimating the capacity of the result HashSet so no extensions and rehashing will be needed underway.

In code:

public static Set&lt;Triple&gt; gen(List&lt;Triple&gt; s) {
// Deduplicate s
s = new ArrayList&lt;&gt;(new HashSet&lt;&gt;(s));
int n = s.size();
// Combine pairs of triplets first
int maxSizeOfT2 = (n * n - 1) / 2;
int capacityForT2 = (maxSizeOfT2 * 4 + 2) / 3;
Set&lt;Triple&gt; t2AsSet = new HashSet&lt;&gt;(capacityForT2);
// For the pairs only pair two *different* triples
for (int i = 0; i &lt; s.size(); i++) {
for (int j = i + 1; j &lt; s.size(); j++) {
Triple newTriplet = f2(s.get(i), s.get(j));
t2AsSet.add(newTriplet);
}
}
List&lt;Triple&gt; t2 = new ArrayList&lt;&gt;(t2AsSet);
// For the combinations of three original triplets
// combine every pair with ever original triplet
int maxSizeOfT = (t2AsSet.size() + 1) * (n + 1) - 1;
int capacityForT = (maxSizeOfT * 4 + 2) / 3;
Set&lt;Triple&gt; t = new HashSet&lt;&gt;(capacityForT);
for (int i = 0; i &lt; t2.size(); i++) {
for (int j = 0; j &lt; s.size(); j++) {
Triple newTriplet = f2(t2.get(i), s.get(j));
t.add(newTriplet);
}
}
// Instead of generating F(s, s, s) just add every s to the result
t.addAll(s);
return t;
}

I didn’t make any benchmarking, just some preliminary time measurements. They are not promising. I am varying the number of triplets in the input, and also the range of the numbers in the triplets. When there only small numbers, many duplicates will be filtered out and the result set will be smaller. With a larger range of numbers, clashes happen seldom, and the size of the result set is bigger.

List  Element     Result   Your time      My time     Improvement
size  range        size   milliseconds  milliseconds      %
-----------------------------------------------------------------
3   1–9              6       0.038       0.015         60
3   1–10 000         7       0.046       0.016         66
400   1–9            159    4736        4740              0
400   1–10 000   858 897    1079        1067              1

In the comments you expected that the best case could be improved, and numbers may indicate that that is true. For the worst case there only seems to be a marginal improvement.

As I said in the comments, the size of the result set is O(n^3), so no algorithm to generate it could be faster than O(n^3). What we might hope for would be a smaller constant factor on the n^3.

huangapple
  • 本文由 发表于 2020年9月20日 07:43:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/63974306.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定