获取前1/3数值的平均值

huangapple go评论64阅读模式
英文:

Getting average of top 1/3 values

问题

我有一个包含超过100个值的一维数组,这些值根据情况不同而变化。我需要找到一种简单的方法来确定这些值中最高的三分之一,然后计算它们的平均值。例如,如果我的数组是[1,2,3,4,5,6,7,8,9],最高的三分之一的值将是[7,8,9],然后平均值将为8。我有太多的值,而且值的数量每次都会变化,无法手动完成这个任务。任何建议将不胜感激。

英文:

I have a 1-D array with over 100 values that changes depending on the case. I need to find a way to easily determine the highest one-third of the values and then calculate their average. For example, if my array was [1,2,3,4,5,6,7,8,9] the highest 1/3 of the values would be [7,8,9] and then the average would be 8. I just have too many values and the amount of values change every time to do this manually. Any suggestions would be greatly appreciated.

答案1

得分: 2

q = np.quantile(arr, 2/3)
arr[arr > q].mean()

这比sort快。因为它实际上不需要对所有值进行排序。
尽管数量级相同。

例如,对于大小为1000000的数组,基于排序的解决方案在我的PC上需要120毫秒,而这个解决方案只需要36毫秒。

但我必须承认,对于100个值来说,无需优化,成本可能只是开销。

英文:
q=np.quantile(arr, 2/3)
arr[arr>q].mean()

It is faster than sort. Because it doesn't really need to sort all values.
Although it is the same order of magnitude.

For example, for an array size 1000000, the sort based solution takes 120 ms on my PC, while this solution takes 36 ms.

But I must admit, for 100 values, there is no need for optimization, and cost is just in overhead probably.

答案2

得分: 2

我认为Quickselect可能是一个不错的选择。

在numpy中,你可以使用np.partition来实现它。

k = int(np.ceil(len(arr) * 2 / 3))
mean = np.partition(arr, k)[k:].mean()

在我的计算机上,当len(arr)为100时,这比朴素的np.sort(arr)[k:].mean()方法稍快。

然而,当len(arr)为1,000,000时,这比@chrslg提出的np.quantile方法(可能也使用了quickselect)或朴素的排序方法明显更快。

基准测试

len(arr) == 100(10,000次试验的平均值)
排序:0.00131毫秒
分位数:0.00923毫秒
分区:0.00117毫秒

len(arr) == 1,000,000(100次试验的平均值)
排序:5.794毫秒
分位数:1.358毫秒
分区:0.792毫秒
英文:

I think Quickselect is probably the way to go.

In numpy you can use np.partition to implement it.

k = int(np.ceil(len(arr) * 2 / 3))
mean = np.partition(arr, k)[k:].mean()

On my computer this is slightly faster than the naive np.sort(arr)[k:].mean() method when len(arr) is 100.

However, when len(arr) is 1_000_000 then this is significantly faster than either np.quantile approach proposed by @chrslg (which likely uses quickselect as well) or the naive sort approach.

Benchmarking

len(arr) == 100 (average of 10_000 trials)
sort: 0.00131 ms
quantile: 0.00923 ms
partition: 0.00117 ms

len(arr) == 1_000_000 (average of 100 trials)
sort: 5.794 ms
quantile: 1.358 ms
partition: 0.792 ms

答案3

得分: 0

这不是最美观的解决方案,但可以快速解决您的问题。

在这个解决方案中,如果 len(vect)/3 不是整数,我将获取较小的整数。我之所以这样做,是因为如果您的向量长度不是3的倍数,那么1/3将得到一个非整数的结果。
如果您想获得下一个整数。例如,如果 len(vect)/3 = 5.6,您想要得到 6,那么请更改 math.floor 函数为 math.ceil。

import math
import numpy as np

vect =  [1,7,3,4,9,6,2,8,5]

np.mean(np.sort(vect)[len(vect)-math.floor(len(vect)/3):])

#返回:
#8 --- [7,8,9] 的均值
英文:

Its not the most beatiful solution but solves your problem fast.

In this solution i will get the smaller integer if len(vect)/3 is not a integer. I do this because if the lenght of your vector is not divisible by 3 you will have a non integer number for 1/3.
If you want to get the next integer. Ex. len(vect)/3 = 5.6 you want 6. then change math.floor function and use math.ceil.

import math
import numpy as np

vect =  [1,7,3,4,9,6,2,8,5]

np.mean(np.sort(vect)[len(vect)-math.floor(len(vect)/3):])

#Return:
#8 --- the mean of [7,8,9]

答案4

得分: 0

首先,你需要对这个列表进行排序,以便前三个值实际上是最高的值。

在Python中用来对列表进行排序的命令是 <list_object>.sort()

由于我没有你的数值,我将生成一个包含随机数值的列表:

from random import randint
values = [random.randint(1,100) for value in range(100)]
values.sort()

这将得到一个按升序排列的列表。

现在我们获取1/3等值:

third = round(len(values)/3)

结果是列表的length(长度)除以三并四舍五入。

现在我们将列表的最后1/3个值相加:

one_third = sum(values[-third:])/len(values[-third:])

这就是你所需要的,以Python的方式完成。

英文:

First of all, you need to sort this list so the top three values are actually the highest values.

The command used to sort lists in python is &lt;list_object&gt;.sort()

Since I don't have your values I'm gonna generate a list with random values:

from random import randint
values = [random.randint(1,100) for value in range(100)]
values.sort()

This will result in a crescent ordered list.

Now we get the 1/3 equivalent value:

third = round(len(values)/3)

The result is le length of the list divided by three and rounded up.

Now we sum the last 1/3 values of that list:

one_third = sum(values[-third:])/len(values[-third:])

This is what you need, in a pythonic way.

答案5

得分: 0

我有一个类似于之前的解决方案。

from statistics import mean
from math import floor

vect = [1, 2, 3, 4, 5, 6, 7, 8, 9]

cut = floor(len(vect) / 3)

print(max(vect[:cut], vect[cut:2*cut], vect[2*cut:], key=lambda x:mean(x)))

而不是打印,你可以返回具有最高1/3值的列表。我使用了一个额外的变量cut来增加清晰度,并在max函数中使用key来基于切片块的平均值选择最大的一个。

英文:

I have a solution similar to the previous one.

from statistics import mean
from math import floor

vect = [1,2,3,4,5,6,7,8,9]

cut = floor(len(vect)/3)

print(max(vect[:cut], vect[cut:2*cut], vect[2*cut:], key=lambda x:mean(x)))

Instead of printing you can return the list with the highest 1/3 values. I used an additional variable cut for the clarity and key in the max function to choose the biggest one based on the mean of the cut fragment.

答案6

得分: 0

I took the source array (a Numpy array) as:

arr = np.array([1, 7, 3, 4, 9, 6, 2, 8, 5, 1])

My first assumption is that it is enough to round down the
size of the source array, so the proper formula can be:

arr.size // 3

The second assumption is that you don't need to perform "full" sort.
It is enough to call np.partition, where:

  • The first argument is the source array.
  • The second argument is the "threshold": Initial n elements are
    "smaller" and the rest are "bigger".

So the proper threshold, leaving 2/3 of smaller elements should be:

n = arr.size - arr.size // 3

getting 7.

And the partitioning should be called as:

np.partition(arr, n)

To take the "bigger" elements (last 1/3), we can use indexing as:

[n:]

so the code to get the "bigger" part is:

np.partition(arr, n)[n:]

getting array([7, 8, 9]).

And to get the mean, we can call np.mean, so the whole code is:

result = np.mean(np.partition(arr, n)[n:])
英文:

I took the source array (a Numpy array) as:

arr =  np.array([1, 7, 3, 4, 9, 6, 2, 8, 5, 1])

My first assumption is that it is enough to round down the
size of the source array, so the proper formula can be:

arr.size // 3

The second assumption is that you don't need to perform "full" sort.
It is enough to call np.partition, where:

  • The first argument is the source array.
  • The second argument is the "threshold": Initial n elements are
    "smaller" and the rest are "bigger".

So the proper threshold, leaving 2/3 of smaller elements should be:

n = arr.size - arr.size // 3

getting 7.

And the partitioning should be called as:

np.partition(arr, n)

To take the "bigger" elemenst (last 1/3), we can use indexing as:

[n:]

so the code to get the "bigger" part is:

np.partition(arr, n)[n:]

getting array([7, 8, 9]).

And to get the mean, we can call np.mean, so the whole code is:

result = np.mean(np.partition(arr, n)[n:])

huangapple
  • 本文由 发表于 2023年2月10日 04:22:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404044.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定