如何在Snowflake列中进行MinMax缩放,同时仍然保持列的总和?

huangapple go评论65阅读模式
英文:

How to do a MinMax Scale in Snowflake column and still maintain overall sum of column?

问题

I currently have a challenge with Snowflake, where I have a PRICE column like the bellow, the goal is to "scale" this values but keep the original sum intact, like, I do not need to respect the proportions on the scale, but lowest value should continue to be the lowest and so on.
Also I guess this can be a SQL problem.

I tried running a script using WIDTH_BUCKET but there is no option to maintain the total sum.

SELECT 
    sale_date, 
    price,
    WIDTH_BUCKET(price, 200000, 600000, 5) AS "SALES GROUP"
  FROM home_sales
  ORDER BY sale_date;

The output(SALES GROUP column) I am really looking for is like this:
By this example the total SUM of column price is 180.

+------------+-----------+-------------+
| SALE_DATE  |     PRICE | SALES GROUP |
|------------+-----------+-------------|
| 2013-08-01 | 10        |          12 |
| 2014-02-01 | 20        |          24 |
| 2015-04-01 | 30        |          28 |
| 2016-04-01 | 10        |          12 |
| 2017-04-01 | 50        |          47 |
| 2018-04-01 | 60        |          57 |
+------------+-----------+-------------+

If we sum the SALES GROUP column, it still has a total of 180.
*OBS: I know this is not exactly scaling, but I am new to snowflake and did not find the most correct term.

英文:

I currently have a challenge with Snowflake, where I have a PRICE column like the bellow, the goal is to "scale" this values but keep the original sum intact, like, I do not need to respect the proportions on the scale, but lowest value should continue to be the lowest and so on.
Also I guess this can be a SQL problem.

I tried running a script using WIDTH_BUCKET but there is no option to maintain the total sum.

`SELECT 
    sale_date, 
    price,
    WIDTH_BUCKET(price, 200000, 600000, 5) AS "SALES GROUP"
  FROM home_sales
  ORDER BY sale_date;`

`+------------+-----------+-------------+
| SALE_DATE  |     PRICE | SALES GROUP |
|------------+-----------+-------------|
| 2013-08-01 | 10        |           1 |
| 2014-02-01 | 20        |           2 |
| 2015-04-01 | 30        |           3 |
| 2016-04-01 | 10        |           1 |
| 2017-04-01 | 50        |           4 |
| 2018-04-01 | 60        |           5 |
+------------+-----------+-------------+`

The output(SALES GROUP column) I am really looking for is like this:
By this example the total SUM of column price is 180.

`+------------+-----------+-------------+
| SALE_DATE  |     PRICE | SALES GROUP |
|------------+-----------+-------------|
| 2013-08-01 | 10        |          12 |
| 2014-02-01 | 20        |          24 |
| 2015-04-01 | 30        |          28 |
| 2016-04-01 | 10        |          12 |
| 2017-04-01 | 50        |          47 |
| 2018-04-01 | 60        |          57 |
+------------+-----------+-------------+`

If we sum the SALES GROUP column, it still has a total of 180.
*OBS: I know this is not exactly scaling, but I am new to snowflake and did not find the most correct term.

答案1

得分: 0

如何将自然对数放大以匹配总和?

select *, (ln(price) * sum(price) over() / sum(ln(price)) over())::int as scaled_price
from t;
英文:

How about scaling up the natural log to match the sum?

select *, (ln(price) * sum(price) over() / sum(ln(price)) over())::int as scaled_price
from t;

huangapple
  • 本文由 发表于 2023年7月17日 18:41:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76703664.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定