寻找数组中子数组的算术平均值的高效方法

huangapple go评论145阅读模式
英文:

Finding Arithmetic Mean of Subarrays Efficiently in an Array

问题

我正在尝试找到一种计算数组子数组算术平均值的方法。
问题可以简化为:给定数组X和整数S,有多少连续的X片段的算术平均值等于S?

例如,给定X=[5,3,6,2]和S=4,结果为3。[5,3]、[6,2]以及[5,3,6,2]的平均值都是4。

数组X最多可能有10万个元素。X的每个值是区间内的整数{-1,000,000,000,+1,000,000,000}。S也是如此。
我们不会对找到的算术平均值进行四舍五入。

我的Java代码如下,适用于小数据集,但效率不高。复杂度为O(n^2)。

public static int returnSubsequenceCount(int[] X, int S) {
    int counter = 0;

    for (int i = 0; i < X.length; i++) {
        int[] dpSum = new int[X.length];

        dpSum[i] = X[i];

        if (X[i] == S) {
            counter++;
        }

        for (int j = i + 1; j < X.length; j++) {
            int sum = dpSum[j - 1] + X[j];

            dpSum[j] = sum;

            if ((double) sum / (j - i + 1) == S) {
                counter++;
            }
        }
    }
    return counter;
}
英文:

I am trying to find number of ways o calculate arithmetic mean of a subarray of an array.
It boils down to this; given an array X and an integer S, how many contiguous fragments of X has arithmetic mean equal to S?

For instance, given X=[5,3,6,2] and S=4 result is 3. [5,3] , [6,2] and [5,3,6,2] has the mean 4.

> X might have up to 100.000 elements. Each value of X is an integer in
> range of {-1.000.000.000,+1.000.000.000}. So is S.
> We don't round the arithmetic mean found.

My code below(on Java) works for small set of data but it is not efficient. O(n^2).

public static int returnSubsequenceCount(int[] X, int S) {
        int counter = 0;

        for (int i = 0; i &lt; X.length; i++) {
            int[] dpSum = new int[X.length];

            dpSum[i] = X[i];

            if (X[i] == S) {
                counter++;
            }

            for (int j = i + 1; j &lt; X.length; j++) {
                int sum = dpSum[j - 1] + X[j];

                dpSum[j] = sum;

                if ((double) sum / (j - i + 1) == S) {
                    counter++;
                }
            }
        }
        return counter;
    }

答案1

得分: 8

这里有一个技巧可以得到一个O(n)的算法:

平均值 = (A[i] + A[i+1] ... + A[j]) / (j - i + 1)

平均值 * (j - i + 1) = A[i] + A[i+1]...+ A[j]

注意,由于现在平均值恰好乘以了方程右侧的元素数目,我们可以将平均值减去 一次 用于每个元素:

0 = (A[i]-平均值) + (A[i+1]-平均值) ... + (A[j]-平均值)

找到连续和为零的子序列可以通过检查前缀和来完成。对于每个最右边的元素(A[j]-平均值),我们想要添加我们之前看到的相同前缀和的次数。我们对前缀和为0进行调整,以便在适用的情况下计算数组前缀的完整长度:

2 1 3,平均值 2

2-2 = 0    前缀和 = 0    次数 = 1(数组前缀的次数)
1-2 = -1   前缀和 = -1
3-2 = 1    前缀和 = 0    次数 = 2(索引0为1,数组前缀为1)

                     总和 = 3
英文:

There's a trick here to obtain an O(n) algorithm:

average = (A[i] + A[i+1] ... + A[j]) / (j - i + 1)

average * (j - i + 1) = A[i] + A[i+1]...+ A[j]

Notice that since average is now multiplied by exactly the number of elements on the right side of the equation, we can subtract the average once for each one of the elements:

0 = (A[i]-average) + (A[i+1]-average) ... + (A[j]-average)

Finding contiguous sums that equal zero can be done by examining prefix sums. For each rightmost element (A[j]-average), we want to add the number of times we've seen the same prefix sum before. We make an adjustment for prefix sum 0 so as to count the full length of the array prefix if applicable:

2 1 3, avg 2

2-2 = 0    ps = 0    count = 1 (1 for the full array prefix)
1-2 = -1   ps = -1
3-2 = 1    ps = 0    count = 2 (1 for index 0 and 1 for the full array prefix)

                     total = 3

答案2

得分: 5

我将使用基于1的索引来执行此算法。这感觉像是那些可以接受的情况之一。

P 为部分和数组,即 P[0] = 0,且 P[i] = X[1] + ... + X[i]。此外,令 Q[i] = P[i] - S * i。例如,

i     0   1   2   3   4   5   6   7
-----------------------------------
X         5   3   6   2   5   5   2
P     0   5   8  14  16  21  26  28
Q     0   1   0   2   0   1   2   0

什么意思 [i,j](包括 ij)的平均值是 S?使用上述符号,可以写为

(P[j] - P[i - 1]) / (j - i + 1) = S     ==&gt;
P[j] - P[i - 1] = S * (j - i + 1)       ==&gt;
P[j] - P[i - 1] = S * j - S * (i - 1)   ==&gt;
P[j] - S * j = P[i - 1] - S * (i - 1)   ==&gt;
Q[j] = Q[i - 1]

这意味着 Q 中任何一对相等值对应于平均值为 S 的一段范围。例如,Q 中的两个值为1对应于范围 [3, 6, 2, 5]。Q 中的四个值为0会产生六个平均值为 S=4 的范围:[5,3], [6,2], [5,5,2], [5,3,6,2], [6,2,5,5,2] 和 [5,3,6,2,5,5,2]。

因此,以下算法也以 O(n log n) 运行,与 @Polygnome 的评论相同,但应该更容易实现:

  • 计算 Q
  • Q 进行排序;
  • 对于 Q 中每批相等值的 k,将 k * (k - 1) / 2 添加到答案;
  • 返回答案。

可以使用哈希表将此算法降低到 O(n),或者如果 Q 中的值的范围足够小,可以使用计数排序。

英文:

I will use 1-based indexing for this algorithm. This feels like one of those cases where it is OK.

Let P be the partial sums array, that is P[0] = 0 and P[i] = X[1] + ... + X[i]. Furthermore, let Q[i] = P[i] - S * i. For example,

i     0   1   2   3   4   5   6   7
-----------------------------------
X         5   3   6   2   5   5   2
P     0   5   8  14  16  21  26  28
Q     0   1   0   2   0   1   2   0

What does it mean that the average of [i,j] (including i and j) is S? With the notations above, it can be written as

(P[j] - P[i - 1]) / (j - i + 1) = S     ==&gt;
P[j] - P[i - 1] = S * (j - i + 1)       ==&gt;
P[j] - P[i - 1] = S * j - S * (i - 1)   ==&gt;
P[j] - S * j = P[i - 1] - S * (i - 1)   ==&gt;
Q[j] = Q[i - 1]

This means that any pair of equal values in Q corresponds to a range of average S. For example, the two values of 1 in Q correspond to the range [3, 6, 2, 5]. The four values of 0 in Q give rise to six ranges of average S=4: [5,3], [6,2], [5,5,2], [5,3,6,2], [6,2,5,5,2] and [5,3,6,2,5,5,2].

Therefore the following algorithm also runs in O(n log n), the same as @Polygnome's comment, but should be considerably easier to implement:

  • calculate Q;
  • sort Q;
  • for every batch of k equal values in Q, add k * (k - 1) / 2 to the answer;
  • return the answer.

This can be reduced to O(n) using a hash table or if the range of the values in Q is small enough to allow counting sort.

答案3

得分: 2

import java.util.*;

public static int returnSubsequenceCount(int[] X, int S) {
    HashMap<Integer, Integer> prefixes = new HashMap<Integer, Integer>();
    int result = 0;
    int[] P = new int[X.length + 1];
    prefixes.put(0, 1);

    int[] Q = new int[X.length + 1];
    P[0] = 0;
    Q[0] = 0;

    for (int i = 1; i < X.length + 1; i++) {
        P[i] = P[i - 1] + X[i - 1];
        Q[i] = P[i] - S * i;

        if (!prefixes.containsKey(Q[i])) {
            prefixes.put(Q[i], 1);
        } else {
            Integer temp = prefixes.get(Q[i]);
            temp++;
            prefixes.put(Q[i], temp);
        }
    }

    for (Map.Entry<Integer, Integer> entry : prefixes.entrySet()) {
        int value = entry.getValue();
        result += value * (value - 1) / 2;
    }

    return result;
}
英文:

Heres a Java solution with prefix sums, also using feedback from this thread.

import java.util.*;

        public static int returnSubsequenceCount(int[] X, int S)
    {
        HashMap&lt;Integer, Integer&gt; prefixes = new HashMap&lt;Integer, Integer&gt;();
        int result = 0;
        int[] P = new int[X.length + 1];
        prefixes.put(0, 1);

        int[] Q = new int[X.length + 1];
        P[0] = 0;
        Q[0] = 0;

        for (int i = 1; i &lt; X.length + 1; i++)
        {
            P[i] = P[i - 1] + X[i - 1];
            Q[i] = P[i] - S * i;

            if (!prefixes.containsKey(Q[i]))
            {
                prefixes.put(Q[i], 1);
            }
            else
            {
                Integer temp=prefixes.get(Q[i]);
                temp++;
                prefixes.put(Q[i],temp);

            }

        }

        for (Map.Entry&lt;Integer, Integer&gt; entry : prefixes.entrySet())
        {
            int value = entry.getValue();
            result += value * (value - 1) / 2;
        }

        return result;
    }

答案4

得分: 1

我发现Cătălin Frâncu的解释非常清晰简明。以下是带有测试的代码(Scala):

object T3 {
  def numOfSubArraysWithMean(a: Array[Int], s: Int): Int =
    a.lazyZip(LazyList from 1)
      .foldLeft((0, List(0))) { case ((pi, q), (ai, i)) =>
        val pi2 = pi + ai
        val qi = pi2 - s * i
        (pi2, qi :: q)
      }._2
      .groupMapReduce(identity)(_=>1)(_+_)
      .withFilter { case (_, v) => v > 1 }
      .map { case (_, v) => v * (v - 1) / 2 }
      .sum
}

class T3Spec extends AnyFunSpec with Matchers {
  import T3._
  def a[A: ClassTag](aa: A*): Array[A] = aa.toArray
  it("a") {
    val data = Seq(
      (a(5,3,6,2,5,5,2),4) -> 8,
      (a(5,3,6,2,5),4) -> 4,
      (a(5,3,6,2,5,3),4) -> 7,
      (a(5,3,6,2),4) -> 3,
      (a(5,3,6,2,4),4) -> 6,
      (a(2,1,3,7,2,2,1,3),2) -> 9,
      (a(0,8,1,7,2,6,3,5),4) -> 10,
      (a(2,2,3,4,1,1),2) -> 4,
      (a(2,2,2,2,2,2),2) -> 21,
      (a(2,2,2,2),2) -> 10,
    )
    for {
      ((a, s), r) <- data
    } numOfSubArraysWithMean(a, s) shouldEqual r
  }
}

在大多数情况下,NNlogN之间的区别并不重要。但是N^2是一个问题。

英文:

I found explanation by Cătălin Frâncu extremely clear and concise. Here is the code with tests (Scala):

object T3 {
def numOfSubArraysWithMean(a: Array[Int], s: Int): Int =
a.lazyZip(LazyList from 1)
.foldLeft((0, List(0))) { case ((pi, q), (ai, i)) =&gt;
val pi2 = pi + ai
val qi = pi2 - s * i
(pi2, qi :: q)
}._2
.groupMapReduce(identity)(_=&gt;1)(_+_)
.withFilter { case (_, v) =&gt; v &gt; 1 }
.map { case (_, v) =&gt; v * (v - 1) / 2 }
.sum
}
class T3Spec extends AnyFunSpec with Matchers {
import T3._
def a[A: ClassTag](aa: A*): Array[A] = aa.toArray
it(&quot;a&quot;) {
val data = Seq(
(a(5,3,6,2,5,5,2),4) -&gt; 8,
(a(5,3,6,2,5),4) -&gt; 4,
(a(5,3,6,2,5,3),4) -&gt; 7,
(a(5,3,6,2),4) -&gt; 3,
(a(5,3,6,2,4),4) -&gt; 6,
(a(2,1,3,7,2,2,1,3),2) -&gt; 9,
(a(0,8,1,7,2,6,3,5),4) -&gt; 10,
(a(2,2,3,4,1,1),2) -&gt; 4,
(a(2,2,2,2,2,2),2) -&gt; 21,
(a(2,2,2,2),2) -&gt; 10,
)
for {
((a, s), r) &lt;- data
} numOfSubArraysWithMean(a, s) shouldEqual r
}
}

In most cases, N vs NlogN doesn't matter. But N^2 is a problem

答案5

得分: 0

Kotlin 版本

    fun solution(A: IntArray, S: Int): Int {
        return A.asSequence()
            .runningFold(0) { acc, i -> acc + i }
            .map { ((it % S) + S) % S }
            .groupingBy { it }
            .eachCount()
            .values
            .sumOf { it * (it - 1) / 2 }
    }
英文:

Kotlin version

fun solution(A: IntArray, S: Int): Int {
return A.asSequence()
.runningFold(0) { acc, i -&gt; acc + i }
.map { ((it % S) + S) % S }
.groupingBy { it }
.eachCount()
.values
.sumOf { it * (it - 1) / 2 }
}

答案6

得分: -1

这是如何在Python中实现的,关键是生成幂集

def mean_subsequence(array: list, mean: int) -> int:
    result = 0
    powerset = [[]]

    for num in array:
        len_ = len(powerset)
        for i in range(len_):
            subsequent = powerset[i]
            # 检查均值
            tmp = [num] + subsequent
            if tmp != [] and sum(tmp) / len(tmp) == mean:
                result += 1
            # 更新幂集
            powerset.append([num] + subsequent)

    return result
英文:

This is how to do it in python, the key is to produce Powerset

def mean_subsequence(array: list, mean: int) -&gt; int:
result = 0
powerset = [[]]
for num in array:
len_ = len(powerset)
for i in range(len_):
subsequent = powerset[i]
#check the mean
tmp = [num] + subsequent
if tmp != [] and sum(tmp) / len(tmp) == mean:
result += 1
#update power set
powerset.appened([num] + subsequent)
return result

huangapple
  • 本文由 发表于 2020年7月24日 15:13:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/63068610.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定