percentileDisc和percentileCount在Apache Age聚合函数中有什么区别?

huangapple go评论48阅读模式
英文:

What is the difference between percentileDisc and percentileCount in Apache Age aggregation functions?

问题

在Apache AGE中,percentileDisc和percentileCount是两种不同的聚合函数,它们用于计算百分位数,以下是它们的区别:

  • percentileDisc(百分位数离散):这个函数计算并返回最接近给定百分位数的值。如果需要插值的值,可以参考percentileCont函数。

  • percentileCount(百分位数计数):这个函数计算并返回百分位数在数据集中的排名。它告诉你在给定百分位数下,有多少数据点在它之前。这不是直接返回一个值,而是返回一个表示排名的数字。

让我们来看看两个查询的示例以更好地理解它们的区别:

第一个查询使用percentileCont函数,计算年龄的第40百分位数(即中位数以下的值)。它会返回一个具体的年龄值,该值是排序后数据集中最接近第40百分位数的年龄。

第二个查询使用percentileDisc函数,计算年龄的第50百分位数(即中位数)。它会返回一个具体的年龄值,该值是排序后数据集中最接近第50百分位数的年龄。

总结一下,percentileDisc返回最接近指定百分位数的实际值,而percentileCount返回在指定百分位数下的数据点排名。希望这有助于理解它们的区别。

英文:

Difference between percentileDisc and percentileCount aggregation functions in Apache AGE.

I am unable to understand the difference between percentileDisc and percentileCount aggregation functions. It says in the documentation that we use percentileDisc calculates "the nearest value to the percentile. For interpolated values, see percentileCont." Could someone please explain the difference between the two with an example. What result would these two queries give

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);

and

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

Thank you in advance!

答案1

得分: 2

percentileContpercentileDisc 函数在 Apache AGE 中都用于计算百分位数,但它们的计算方式略有不同。以下是它们的区别:

  • percentileDisc:此函数计算离散百分位数。
    离散百分位数不进行值的插值,始终返回给定数据集中存在的值。它返回给定百分比所对应的值下限。例如,如果你想找出数据中年龄的第50百分位数(中位数),percentileDisc 将返回数据集中精确位于中间位置(按升序排序时)的年龄。如果数据点的数量是偶数,它将返回两个中间值中较小的一个。

  • percentileCont:此函数计算连续百分位数。
    与 percentileDisc 不同,当请求的百分位数位于两个数据点之间时,percentileCont 可以在数据集中进行插值。这会产生更“连续”的度量结果,当数据集较大时,可以提供更准确的描述。

让我们考虑一个年龄数据集的示例:10、20、30、40、50。

如果你执行 percentileDisc(n.age, 0.5),它将返回 30,因为30是该数据集的精确中间值。

但是,如果你执行 percentileCont(n.age, 0.4),它将在20和30之间进行插值,因为第40百分位数不正好落在特定的数据点上。这将导致返回值为24(在20和30之间的0.6位置)。

因此,通常在你想要找到代表第N个百分位数的实际数据点时使用 percentileDisc,而在你想要基于数据的连续分布计算第N个百分位数时使用 percentileCont,即使结果值不是数据集中的实际数据点。

英文:

Both percentileCont and percentileDisc functions in Apache AGE are used to calculate percentiles, but they do so in slightly different ways. Here's how:

  • percentileDisc: This function calculates the discrete percentile.
    Discrete percentile does not interpolate values and will always
    return a value that is present in the given dataset. It returns the
    value below which a given percentage falls into. For example, if you
    want to find out the 50th percentile (median) of ages in your data,
    percentileDisc will return the age at the exact middle of your
    dataset (when sorted in ascending order). If there is an even number
    of data points, it will return the lower of the two middle values.

  • percentileCont: This function calculates the continuous percentile.
    Unlike percentileDisc, percentileCont may interpolate between values
    in the dataset when the requested percentile lies between two data
    points. This results in a more "continuous" measure that can provide
    a more accurate picture when the dataset is large.

Let's consider an example dataset of ages: 10, 20, 30, 40, 50.

If you execute percentileDisc(n.age, 0.5), it will return 30 because 30 is the exact middle value of this dataset.

But if you execute percentileCont(n.age, 0.4), it will interpolate between 20 and 30 because the 40th percentile is not exactly on a specific data point. This results in a return value of 24 (0.6 of the way between 20 and 30).

So, the percentileDisc is usually used when you want to find an actual data point in your dataset that represents the Nth percentile, while percentileCont is used when you want to calculate the Nth percentile based on the continuous distribution of your data, even if the resulting value is not an actual data point in your dataset.

答案2

得分: -1

在这里,你有一点困惑,但是percentileDisc和percentileCount聚合函数是不同的。<br>

假设年龄是一个包含[10, 20, 35, 50, 60, 70]的值集合。

  • 计算percentileDisc(ages, 0.5):
    如果我们使用百分位数为0.5(50%)的percentileDisc,它将给出最接近50th百分位数的值。
    在这种情况下,50th百分位数将是中位数。由于我们有偶数个年龄,中位数为(35 + 50) / 2 = 42.5。因此,使用百分位数为0.5的percentileDisc 将返回42.5

  • 计算percentileCount(ages, 0.4): PercentileCount将计算小于或等于给定的(例如40th)百分位数的值的数量。<br>
    要计算40th百分位数,在这种情况下,6个数据点的40%为2.4。因为我们需要找到大于或等于2.4的值,所以我们查找排序列表中的第三个值,即35。因此,使用百分位数为0.4的percentileCount将返回小于或等于35的值的计数,即3

英文:

In here, you have little confusion but percentileDisc and percentileCount aggregation functions both are different. <br>

Let's assume that ages is a set of values with [10, 20, 35, 50, 60, 70].

  • Calculate percentileDisc(ages, 0.5):
    If we use percentileDisc with a percentile of 0.5 (50%), it will give us the value closest to the 50th percentile.
    In this case, the 50th percentile would be the median. Since we have an even number of ages, the median is (35 + 50) / 2 = 42.5. So, using percentileDisc with a percentile of 0.5 would return 42.5.

  • Calculate percentileCount(ages, 0.4): PercentileCount will count the number of values that are less than or equal to the given(such 40th) percentile.<br>
    To calculate the 40th percentile, In this case, 40% of 6 data points is 2.4. Since we need to find a value greater than or equal to 2.4, we look for the third value in the sorted list, which is 35. So, using percentileCount with a percentile of 0.4 would return the count of values less than or equal to 35, which is 3.

答案3

得分: -1

percentileDisc (百分位数离散值):
percentileDisc函数返回指定百分位数处的值。它通过将值按升序排序并选择所需百分位数处的值来工作。这意味着返回的值可能不在原始数据集中。

percentileCount (百分位数计数):
percentileCount函数返回小于或等于指定百分位数的值的计数。它计算累积分布函数(CDF)并返回落在百分位数范围内的值的数量。

英文:

percentileDisc (Percentile Discrete):
The percentileDisc function returns the value at the specified percentile. It works by sorting the values in ascending order and selecting the value at the desired percentile. This means that the returned value might not be present in the original dataset.

percentileCount (Percentile Count):
The percentileCount function returns the count of values less than or equal to the specified percentile. It calculates the cumulative distribution function (CDF) and returns the number of values that fall within the percentile range.

答案4

得分: -1

根据文档,percentileDisc 计算“接近于百分位数的值”。它返回第 n 个百分位数的精确值。例如,如果我们有一组数字 {1, 2, 3, 4, 5},并且想要找到第 50 个百分位数,percentileDisc 将返回值 3。
另一方面,percentileCont 使用相邻值之间的线性插值来计算百分位数。例如,如果我们有一组数字 {1, 2, 3, 4, 5},并且想要找到第 40 个百分位数,percentileCont 将返回值 2.5,这是在 2 和 3 之间插值得到的值。

在给定的示例查询中,第一个查询使用 percentileCont 来找到图中所有 Person 节点的年龄的第 40 个百分位数,而第二个查询使用 percentileDisc 来找到图中所有 Person 节点的年龄的第 50 个百分位数。第一个查询将返回两个相邻值之间的插值值,而第二个查询将返回第 50 个百分位数的精确值。

英文:

According to the documentation, percentileDisc calculates "the nearest value to the percentile.". It returns the exact value of the nth percentile. For example, if we have a set of numbers {1, 2, 3, 4, 5} and we want to find the 50th percentile, percentileDisc will return the value 3.
On the other hand, percentileCont calculates the percentile using linear interpolation between adjacent values. For example, if we have a set of numbers {1, 2, 3, 4, 5} and we want to find the 40th percentile, percentileCont will return the value 2.5, which is the interpolated value between 2 and 3.

In the given example queries, the first query uses percentileCont to find the 40th percentile of the age of all Person nodes in the graph, while the second query uses percentileDisc to find the 50th percentile of the age of all Person nodes in the graph. The first query will return an interpolated value between two adjacent values, while the second query will return the exact value of the 50th percentile.

答案5

得分: -1

使用AGE文档提供的数据作为示例,

SELECT create_graph('graph_name');

SELECT * FROM cypher('graph_name', $$
    CREATE (a:Person {name: 'A', age: 13}),
    (b:Person {name: 'B', age: 33, eyes: "blue"}),
    (c:Person {name: 'C', age: 44, eyes: "blue"}),
    (d1:Person {name: 'D', eyes: "brown"}),
    (d2:Person {name: 'D'}),
    (a)-[:KNOWS]->(b),
    (a)-[:KNOWS]->(c),
    (a)-[:KNOWS]->(d1),
    (b)-[:KNOWS]->(d2),
    (c)-[:KNOWS]->(d2)
$$) as (a agtype);

percentileCont

运行percentileCont()函数将产生以下输出:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);

 percentile_cont_age
---------------------
 29.0
(1 row)

查看了如何从'agtype.c'文件计算percentileCount(),其中进行了线性插值计算,

result = y1 + [(x - x1) * (y2 - y1)] / (x2 - x1)

x = percentile * (number_of_rows - 1) - x1
x1 = floor(percentile * (number_of_rows - 1))
x2 = ceil(percentile * (number_of_rows - 1))
y1 = value_of_x1
y2 = value_of_x2

在这个示例中,由于percentile = 0.4number_of_rows = 3(年龄为13, 33和44),这将导致:

x = 0.4 * (3 - 1) - 0 = 0.8
x1 = floor(0.4 * (3 - 1)) = floor(0.8) = 0
x2 = ceil(0.4 * (3 - 1)) = ceil(0.8) = 1
y1 = value_of_x1 = 13
y2 = value_of_x2 = 33

result = 13 + [(0.8 - 0) * (33 - 13)] / (1 - 0) = 29

这正是使用percentileCont()函数时得到的结果。

percentileDisc

运行percentileDisc()函数将产生以下输出:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

 percentile_disc_age
---------------------
 33.0
(1 row)

这个函数使用了更简单的计算方法,使用四舍五入方法来计算接近百分位的值。

result = round_to_nearest_val(percentile * (max_val - min_val) + min_val)

在这个示例中,由于percentile = 0.5max_val = 44min_val = 13(年龄为13, 33和44),这将导致:

result = round_to_nearest_val(0.5 * (44 - 13) + 13) = round_to_nearest_val(28.5) = 33

这正是使用percentileDisc()函数时得到的结果。

希望这有所帮助!

英文:

Using the data provided by the AGE documentation as an example,

SELECT create_graph(&#39;graph_name&#39;);

SELECT * FROM cypher(&#39;graph_name&#39;, $$
	CREATE (a:Person {name: &#39;A&#39;, age: 13}),
	(b:Person {name: &#39;B&#39;, age: 33, eyes: &quot;blue&quot;}),
	(c:Person {name: &#39;C&#39;, age: 44, eyes: &quot;blue&quot;}),
	(d1:Person {name: &#39;D&#39;, eyes: &quot;brown&quot;}),
	(d2:Person {name: &#39;D&#39;}),
	(a)-[:KNOWS]-&gt;(b),
	(a)-[:KNOWS]-&gt;(c),
	(a)-[:KNOWS]-&gt;(d1),
	(b)-[:KNOWS]-&gt;(d2),
	(c)-[:KNOWS]-&gt;(d2)
$$) as (a agtype);

percentileCont

Running the percentileCont() function will produce an output:

SELECT *
FROM cypher(&#39;graph_name&#39;, $$
    MATCH (n:Person)
    RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);

 percentile_cont_age
---------------------
 29.0
(1 row)

Having taken a look at how the percentileCount() is calculated from the 'agtype.c' file, linear interpolation is calculated where,

result = y1 + [(x - x1) * (y2 - y1)] / (x2 - x1)

x = percentile * (number_of_rows - 1) - x1
x1 = floor(percentile * (number_of_rows - 1))
x2 = ceil(percentile * (number_of_rows - 1))
y1 = value_of_x1
y2 = value_of_x2

In this example, as percentile = 0.4 and number_of_rows = 3 (with ages 13, 33, and 44), this would result to:

x = 0.4 * (3 - 1) - 0 = 0.8
x1 = floor(0.4 * (3 - 1)) = floor(0.8) = 0
x2 = ceil(0.4 * (3 - 1)) = ceil(0.8) = 1
y1 = value_of_x1 = 13
y2 = value_of_x2 = 33

result = 13 + [(0.8 - 0) * (33 - 13)] / (1 - 0) = 29

Which is exactly what we got when using the percentileCont() function.

percentileDisc

Running the percentileDisc() function will produce an output:

SELECT *
FROM cypher(&#39;graph_name&#39;, $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

 percentile_disc_age
---------------------
 33.0
(1 row)

This function uses a simpler method of calculation, using a rounding method and calculating the nearest value to the percentile.

result = round_to_nearest_val(percentile * (max_val - min_val) + min_val)

In this example, as percentile = 0.5, max_val = 44, and min_val = 13 (with ages 13, 33, and 44), this would result to:

result = round_to_nearest_val(0.5 * (44 - 13) + 13) = round_to_nearest_val(28.5) = 33

Which is exactly what we got when using the percentileDisc() function.

Hope this helps!

答案6

得分: -1

percentileCont:计算给定数值在一组数据中的百分位数,百分位数的取值范围从0.1到1.0。

  • 如果百分位数不与具体数据点对齐,将使用线性插值方法。

percentileDisc:同样计算给定数值在一组数据中的百分位数。然而,它使用舍入方法来计算最接近百分位数的数值,如果百分位数不恰好在一个特定的数据点上。

访问Apache Age官方文档以获取更多信息。

英文:

percentileCont: calculates the percentile of a given value over a group using a value from 0.1 to 1.0.

  • This uses linear interpolation method if the percentile doesn't align with a specific data point.
SELECT *
FROM cypher(&#39;graph_name&#39;, $$
    MATCH (n:Person)
    RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);

percentileDisc: also calculates the percentile of a given value over a group. However, it uses a rounding method to calculate the nearest value to the percentile if the percentile does not fall on a specific data point.

SELECT *
FROM cypher(&#39;graph_name&#39;, $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

Visit the Apache Age Official Documentation for more on this

答案7

得分: -1

基本上,PercentileCont 在相邻数值之间使用线性插值,而 PercentileDisc 返回最接近百分位数的值,不进行插值。

英文:

Basically PercentileCont uses linear interpolation between adjacent values, while PercentileDisc returns the nearest value to the percentile without interpolation.

答案8

得分: -1

我们使用percentileDisc和percentileCount来在数据集中查找百分位数。PercentileDisc从数据集中生成与指定百分位数相对应的精确数字,而percentileCount则提供了落在或等于提供的百分位数以下的值的近似计数。

英文:

We use percentileDisc and percentileCount to find percentiles in a dataset. PercentileDisc produces an exact number from the dataset that corresponds to the specified percentile, whereas percentileCount gives an approximate count of values that fall below or equal the supplied percentile.

答案9

得分: -1

在年龄百分位数中,percentileDisc 和 percentileCount 是用于计算百分位数的两个聚合函数。

percentileDisc:percentileDisc 代表百分位数离散,返回最接近指定百分位数的值。它遍历数据集并返回适当的百分位数值。如果没有精确匹配,它将返回最接近的值。

percentileCount:百分位数计数是一个用于返回低于某个百分位数的值的计数的函数。它返回低于指定百分位数的值的计数。

简而言之,percentileDisc 返回特定百分位数或最接近特定百分位数的值,而 percentileCount 返回指定百分位数下的值的计数。

英文:

In age percentileDisc and percentileCount are 2 aggregate functions used for calculating percentiles.

percentileDisc: percentileDisc stands for percentile discrete and returns the nearest value to the specified percentile. It goes through the dataset and returns the suitable percentile value. If there is no exact match it will return the closest value.

percentileCount: The percentile Count is a function that is used to return the count of values below a certain percentile. It returns the count of the values below the percentile that was specified.

In short, percentileDisc returns the value at a certain percentile or closest to the certain percentile and percentileCount returns the count of values below the specified percentile.

答案10

得分: -1

根据文档,"percentileDisc() 返回给定值在组内的百分位,百分位范围从0.0到1.0。它使用四舍五入方法,计算最接近百分位的值。" 这意味着它返回与传入函数的百分位非常接近或完全相符的值,不论该组的值是奇数还是偶数。

然而,percentileCont() 用于插值值。这意味着在值数量为偶数的组中,percentileCont() 会考虑到距离指定百分位最近的两个值,并返回这两个值之间的加权平均值。对于值数量为奇数的组,它就像 percentileDisc() 一样,直接返回该百分位上的确切值。

英文:

According to the documentation, "percentileDisc() returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0.
It uses a rounding method and calculates the nearest value to the percentile". This means that it returns the value very close to or the exact specified
percentile passed into the function for a group with odd or even number of values.

However, percentileCont() are used for interpolated values. This means that in the case of a group with even number of values, percentileCont()
considers the two values closest to the specified percentile and returns the weighted average between these two values. For a group with odd number of values, it simply just returns the exact value at that percentile just like percentileDisc().

答案11

得分: -1

percentileDisc函数计算数据集中表示指定百分位数的值。
它选择最接近指定百分位数排名的位置上的值。如果在相同排名上有多个值,函数会选择最小的值。

percentileCont函数通过线性插值计算指定百分位数处的值。
它返回介于两个数据点之间的值,基于指定的百分位数。与percentileDisc相比,这提供了更精确的结果。

使用percentileCont进行查询:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);

这个查询使用线性插值计算了第40百分位数的值。结果将是数据集中的一个具体值。

使用percentileDisc进行查询:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

这个查询使用最近排名方法计算了表示50th百分位数的值。结果将是一个具体值,将年龄的下50%与上50%(中位数)分开。

英文:

The percentileDisc function calculates the value that represents the specified percentile in the dataset.
It selects the value at the position closest to the specified percentile rank. If there are multiple values at the same rank, the function chooses the smallest value.

The percentileCont function calculates the value at the specified percentile using linear interpolation between adjacent values.
It returns a value that lies between two data points, based on the specified percentile. This provides a more precise result when compared to percentileDisc.

Query using percentileCont:

   SELECT *
    FROM cypher(&#39;graph_name&#39;, $$
        MATCH (n:Person)
        RETURN percentileCont(n.age, 0.4)
    $$) as (percentile_cont_age agtype);

This query calculates the value at the 40th percentile using linear interpolation. The result will be a specific value from the dataset.

Query using percentileDisc:

SELECT *
FROM cypher(&#39;graph_name&#39;, $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

This query calculates the value that represents the 50th percentile using the nearest-rank method. The result will be a specific value that separates the lower 50% of ages from the upper 50% (the median).

答案12

得分: -1

两者之间的区别主要在于percentileCont()函数在所寻找的百分位位于两个值之间时返回两个值的平均值,而percentileDisc()函数返回小于或等于所寻找百分位的值,换句话说,它将百分位四舍五入到最接近的值。

英文:

The difference between the two is mostly that the percentileCont() function returns the average between two values if the sought percentile is between them, while the percentileDisc() function returns the value that is less than or equal to the sought percentile, in order words it rounds off to the nearest value to the percentile.

答案13

得分: -1

percentileDisc是一个用于找到与特定百分位数相对应的数据集中的值的函数。

例如,您有一个数据集**[10,20,30,40,50],我们想要使用percentileDisc函数找到60th百分位数。结果将是30**,因为它对应于数据集中的60th百分位数。

percentileCount是一个用于计算数据集中低于或等于特定百分位数的值的函数。

例如,您有一个数据集**[10,20,30,40,50],我们想要计算其60th百分位数计数。结果将是[10,20,30]**,因为它们低于或等于60th百分位数。

英文:

percentileDisc is a function that is used to find the value in a dataset that corresponds to a specific percentile.

For example, you have a dataset [10,20,30,40,50] and we want to find the 60th percentile using the percentileDisc function.
The result would be 30 because it corresponds to the 60th percentile in dataset.

percentileCount is a function that is used to count the values that are below or equal to a specific percentile in the dataset.

For example, you have a dataset [10,20,30,40,50] and we want to calculate its 60th percentileCount. It will be [10,20,30] because they are less than or equal to 60th percentile.

答案14

得分: -1

以下是关于它们之间的区别的理解。

Disc 代表离散值。

Cont 代表连续值。


percentileDisc:它使用一种四舍五入的方法,并计算最接近百分位的值。

例如:percentileDisc(ages, 0.5) 将计算表达式的第50百分位数(即中位数)。

percentileDisc 根据列值的离散分布来计算百分位数。


percentileCont:它使用一种线性插值方法,如果所需的百分位位于两个值之间,则计算它们的加权平均值。

例如:percentileCont(ages, 0.5) 将计算表达式的加权平均值。

percentileCont 根据列值的连续分布来计算百分位数。


查看官方文档 以更好地理解。

英文:

Here is what I understood about the difference between them.

Disc stands for discrete values.

Cont stands for continuous values.


percentileDisc: It uses a rounding method and calculates the nearest value to the percentile.

For example: percentileDisc(ages, 0.5) will compute exactly the 50th percentile (that is, the median) of an expression.

percentileDisc calculates the percentile based on a discrete distribution of the column values.


percentileCont: It uses a linear interpolation method, calculating a weighted average between two values if the desired percentile lies between them.

For example: percentileCont(ages, 0.5) will compute the weighted average of an expression.

percentileCont calculates the percentile based on a continuous distribution of the column values.


check out the official documentation for a better understanding.

答案15

得分: -2

percentileDisc 帮助您在数据集中找到特定百分位数的值。它提供给您实际数值。

percentileCount 帮助您计算数据集中小于或等于特定百分位数的值的数量。它提供给您一个计数,而不是实际数值。

英文:

percentileDisc helps you find the value at a particular percentile in a dataset. It gives you the actual value.

percentileCount helps you count the number of values in a dataset that are less than or equal to a specific percentile. It gives you a count, not the actual value.

答案16

得分: -2

这两个是Apache AGE的聚合功能。我们可以说percentileDisc()返回在一组中提供的数字的百分位数,百分位数范围从0.0到1.0,可以参考文档。它使用四舍五入方法计算最接近百分位数的数字。而percentileCont()提供了一组中值的百分位数,范围从0.0到1.0。如果所需的百分位数落在两个值之间,它会使用线性插值方法计算两个值之间的加权平均值,通过采用四舍五入技术来寻找最接近的数字。上面提到的示例:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

在这种情况下,属性年龄的数据的第50个百分位数将是答案。

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);

在这种情况下,使用加权平均值来获取属性年龄中值的第40个百分位数。0.4是中位数或第40个百分位数。

英文:

Both of these are Apache AGE's aggregation features. We may state that percentileDisc() returns the percentile of the supplied number across a group, with a percentile ranging from 0.0 to 1.0, by referring to the documentation. It calculates the number that is closest to the percentile using a rounding approach. While percentileCont() delivers the value's percentile inside a group, ranging from 0.0 to 1.0. If the desired percentile falls between two values, it employs a linear interpolation approach to calculate a weighted average between the two values. seeking the closest numbers by employing a rounding technique.The Examples stated above:

SELECT *
FROM cypher(&#39;graph_name&#39;, $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

The 50th percentile of data for the property's age will be the answer in this case.

SELECT *
FROM cypher(&#39;graph_name&#39;, $$
    MATCH (n:Person)
    RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);

A weighted average is used to get the 40th percentile of the values in the property age. 0.4 is the median or 40th percentile in this situation.

答案17

得分: -2

提供的百分位数由percentileDisc返回,而低于指定百分位数的值的计数由percentileCount返回。percentileDisc提供精确计算,而percentileCount是一种更快的估算方法。

英文:

The value at the provided percentile is returned by percentileDisc, whereas the count of values below the specified percentile is returned by percentileCount. percentileDisc provides a precise computation, whereas percentileCount is a faster estimation method.

huangapple
  • 本文由 发表于 2023年6月1日 22:07:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76382780.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定