Presto/Trino/Athena – 将varchar强制转换为double时的错误减法

huangapple go评论74阅读模式
英文:

Presto/Trino/Athena - Wrong subtraction of casted varchar to double

问题

计算出来的体重看起来有这么多小数位,是因为你在查询中将体重从字符串(varchar)转换为双精度(double)。当你进行这种类型转换时,通常会保留更多的小数位,以确保数据精度。这是正常的行为,特别是如果原始体重数据包含了很多小数位。如果你希望结果显示更少的小数位,你可以使用ROUND函数或者CAST函数来限制小数位数,例如:

SELECT t1.user_id, t1.sample_id, t1.weight, 
  ROUND(cast(t1.weight as double) - cast(t2.weight as double), 2) AS weight_loss
FROM my_table t1
JOIN my_table t2 ON t1.user_id = t2.user_id AND t1.sample_id - 1 = t2.sample_id
ORDER BY t1.user_id, t1.sample_id

在上面的示例中,我使用了ROUND函数将结果限制为两位小数。你可以根据需要调整小数位数。

英文:

I am using AWS Athena and trying to calculate the weight loss of each user between two samples.
My weight column is varchar, so I cast it into double and then subtract them.
I am using the following query:

SELECT t1.user_id, t1.sample_id, t1.weight, 
  cast(t1.weight, double) - cast(t2.weight, double) AS weight_loss
FROM my_table t1
JOIN my_table t2 ON t1.user_id = t2.user_id AND t1.sample_id - 1 = t2.sample_id
ORDER BY t1.user_id, t1.sample_id

and I get the folowing result:
Presto/Trino/Athena – 将varchar强制转换为double时的错误减法

Why does the calculated weight looks like this with so many floating points?

答案1

得分: 1

Decimal数据类型在Presto中是可以解决您的问题的工具。

以以下代码为例:

SELECT t1.user_id, t1.sample_id, t1.weight,
cast(t1.weight, DECIMAL(10,1)) - cast(t2.weight, DECIMAL(10,1)) AS weight_loss
FROM my_table t1
JOIN my_table t2 ON t1.user_id = t2.user_id AND t1.sample_id - 1 = t2.sample_id
ORDER BY t1.user_id, t1.sample_id

英文:

Decimal Data Type in Presto is the tool which can slove your problem.

See the following code as example:

SELECT t1.user_id, t1.sample_id, t1.weight, 
  cast(t1.weight, DECIMAL(10,1)) - cast(t2.weight, DECIMAL(10,1)) AS weight_loss
FROM my_table t1
JOIN my_table t2 ON t1.user_id = t2.user_id AND t1.sample_id - 1 = t2.sample_id
ORDER BY t1.user_id, t1.sample_id

答案2

得分: 1

首先,如先前提到的 - 您可以使用更精确的数据类型,即 decimal。此外,我建议查看窗口函数,特别是 lag 函数,因为实际上不需要执行连接操作(如果数据量很大的话,连接操作可能会很昂贵,而且我不确定 Presto/Trino 能否优化它)。以下是一些示例代码:

select user_id, 
    sample_id,
    weight,
    decimal_weight - lag(decimal_weight) over (partition by user_id order by sample_id) AS weight_loss
from (
    SELECT user_id, 
        sample_id,
        weight,
        cast(weight as decimal(10,1)) decimal_weight
    FROM my_table)
ORDER BY user_id, sample_id;
英文:

First of all as mentioned previously - you can use more precise data type i.e. decimal. But also I would recommend to look into window functions, especially lag one because there is no need to actually perform a join (which can be quite costly if there is a lot of data and I'm not sure that Presto/Trion will be able to optimize that). Something along this lines:

select user_id, 
    sample_id,
    weight,
    decimal_weight - lag(decimal_weight) over (partition by user_id order by sample_id) AS weight_loss
from (
    SELECT user_id, 
        sample_id,
        weight,
        cast(weight as decimal(10,1)) decimal_weight
    FROM my_table)
ORDER BY user_id, sample_id;

huangapple
  • 本文由 发表于 2023年2月23日 23:17:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75546779.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定