Percentage & GROUP BY

huangapple go评论48阅读模式
英文:

Percentage & GROUP BY

问题

我目前正在使用一个碰撞数据集,该数据集提供了在给定日期发生的所有情况。该数据集是一个SQLite数据库。

我的第一直觉是获取给定日期的总数,输出看起来像这样:

collision_date SUM(severe_injury_count) SUM(injured_victims)
2001-02-20 19 785
2001-02-20 12 697
2001-02-20 28 823
2001-02-20 29 871

上面的示例是以下查询的输出:

SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims)
FROM collisions c
GROUP BY collision_date
LIMIT 50,100;

我想计算严重受伤次数/受伤受害者次数的百分比,我认为这应该很简单,因此我尝试运行以下查询(一旦注意到它没有给我我打算的结果,我尝试了一些不同的方法来计算百分比):

SELECT 
   collision_date, 
   SUM(severe_injury_count/injured_victims) AS chance_being_sever_injured,
   SUM(severe_injury_count),
   SUM(injured_victims),
   (severe_injury_count/injured_victims)*100,
   (SUM(severe_injury_count)/SUM(injured_victims))*100
FROM collisions c 
GROUP BY collision_date;

但是我得到的输出并没有像我预期的那样进行计算,给我了这样的结果:

collision_date chance_being_sever_injured SUM(severe_injury_count) SUM(injured_victims) (severe_injury_count/injured_victims)*100 (SUM(severe_injury_count)/SUM(injured_victims))*100
2001-02-20 13 19 785 NULL 0
2001-02-20 5 12 697 NULL 0
2001-02-20 17 28 823 0 0
2001-02-20 18 29 871 NULL 0

我检查了变量类型,它们都是整数而不是字符串,所以我本来期望实际的百分比被计算。

根据输出结果,我认为在执行这种操作时,我可能遗漏了一些基本的东西。

我还尝试使用FORMAT(),但输出也都是零:

FORMAT((SUM(severe_injury_count)/SUM(injured_victims))*100,2)

如果有任何见解,将不胜感激。

谢谢您的时间和反馈。

【实施建议,因此扩展初始帖子:】

我也尝试了以下操作:

SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c
GROUP BY collision_date
LIMIT 50,100;

还尝试通过以下方式排除可能的NULL:

SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c WHERE severe_injury_count IS NOT NULL OR  injured_victims IS NOT NULL
GROUP BY collision_date
LIMIT 50,100;
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c WHERE severe_injury_count > 0 OR  injured_victims > 0 
GROUP BY collision_date
LIMIT 50,100;

上述所有替代方案都为我提供了“百分比”列的值为0。

还尝试按照@easleyfixed的建议,将给定列的类型强制转换:

SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(CAST(severe_injury_count AS INT))/SUM(CAST(injured_victims AS INT)) AS DECIMAL)
FROM collisions c WHERE severe_injury_count > 0 OR  injured_victims > 0 
GROUP BY collision_date;

【扩展@nnichols和@easleyfixed的建议:】

为了更好地说明数据,运行以下查询:

SELECT collision_date,COUNT(*)
FROM collisions c 
GROUP BY collision_date;

给我提供了(表示给定日期的记录数):

collision_date COUNT(*)
2001-01-01 1000
2001-01-02 1330
2001-01-03 1329
2001-01-04 1346
2001-01-05 1457
等等 等等

因此,我扩展了查询,以尝试包括我试图评估的内容:

SELECT collision_date,COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
FROM collisions c 
GROUP BY collision_date;

输出如下:

collision_date COUNT(*) SUM(severe_injury_count) SUM(injured_victims) SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
2001-01-01 1000 37 676 0
2001-01-02 1330 30 797 0
2001-01-03 1329 28 793 0
2001-01-04 1346 23 758 0
2001-01-05 1457 30 836 0
等等 等等 等等 等等 等等

我仔细检查了数据库类型,具有列的类型都是INT,但collision_date实际上设置为“TEXT”。

为了好玩,我尝试了以下操作:

SELECT CAST(collision_date AS DATE),COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
FROM collisions c 
GROUP

<details>
<summary>英文:</summary>

I&#39;m currently working with a collisions dataset which provides all cases that occur in a given day. The dataset is a sqlite database.


My first instinct was to get the totals for a given day, where the output looked something like:

|collision_date|SUM(severe_injury_count)|SUM(injured_victims)|
|-------------|----------|--------|
|2001-02-20|19|785|
|2001-02-20|12|697|
|2001-02-20|28|823|
|2001-02-20|29|871|

The above example is the output of the below query:

    SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims)
    FROM collisions c
    GROUP BY collision_date
    LIMIT 50,100;

I wanted to calculate a percentage of severe_injury_count/injured_victims, I thought it would be straightforward, therefore I attempted running this query (with a few variations of how I might have calc. the % - once I noticed it wasn&#39;t giving me what I intended):



    SELECT 
       collision_date, 
       SUM(severe_injury_count/injured_victims) AS chance_being_sever_injured,
       SUM(severe_injury_count),
       SUM(injured_victims),
       (severe_injury_count/injured_victims)*100,
       (SUM(severe_injury_count)/SUM(injured_victims))*100
    FROM collisions c 
    GROUP BY collision_date;

But the output I&#39;ve been given does do the calculation as I might have expected, giving me results like:


|collision_date|chance_being_sever_injured|SUM(severe_injury_count)|SUM(injured_victims)|(severe_injury_count/injured_victims)*100|(SUM(severe_injury_count)/SUM(injured_victims))*100|
|-----------|-----------|-----------|-----------|-----------|-----------|
|2001-02-20|13|19|785|NULL|0|
|2001-02-20|5|12|697|NULL|0|
|2001-02-20|17|28|823|0|0|
|2001-02-20|18|29|871|NULL|0|


I checked the variable types and they are all integers and not strings, so I would have expected to have the actual percentages calculated.

Given the output results, I believe that I&#39;m missing something fundamental when doing this type of operation.

I also tried using FORMAT(), but the output were all zero&#39;s as well...

&gt; FORMAT((SUM(severe_injury_count)/SUM(injured_victims))*100,2)

Any insight would be much appreciated.

Thank you for your time and feedback.

______________________________________

**Implementing suggestions, hence extending initial post:**

I tried the following as well:

    SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
    FROM collisions c
    GROUP BY collision_date
    LIMIT 50,100;


Tried also to exclude possible NULL&#39;s by:

    SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
    FROM collisions c WHERE severe_injury_count IS NOT NULL OR  injured_victims IS NOT NULL
    GROUP BY collision_date
    LIMIT 50,100;
    
    SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
    FROM collisions c WHERE severe_injury_count &gt; 0 OR  injured_victims &gt; 0 
    GROUP BY collision_date
    LIMIT 50,100;

All the above alternatives give me 0 as values for the &quot;percentage&quot; column I&#39;m trying to calculate.


Also attempted to coerce the type for a given column as suggested by @easleyfixed like so:

    SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(CAST(severe_injury_count AS INT))/SUM(CAST(injured_victims AS INT)) AS DECIMAL)
    FROM collisions c WHERE severe_injury_count &gt; 0 OR  injured_victims &gt; 0 
    GROUP BY collision_date;


------------------------------------------


**Expanding on @nnichols &amp; @easleyfixed suggestions**

To better illustrate the data, running:

    SELECT collision_date,COUNT(*)
    FROM collisions c 
    GROUP BY collision_date;

Gives me (represents the number of records for a given date):


|collision_date|COUNT(*)|
|-------------|---------|
|2001-01-01|1000|
|2001-01-02|1330|
|2001-01-03|1329|
|2001-01-04|1346|
|2001-01-05|1457|
|etc|etc|

I therefore expanded the query to try and include what I&#39;m trying to assess.


    SELECT collision_date,COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
    SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
    FROM collisions c 
    GROUP BY collision_date;

Outputs:

|collision_date|COUNT(*)|SUM(severe_injury_count)|SUM(injured_victims)|SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured|
|-------------|---------|---------|---------|---------|
|2001-01-01|1000|37|676|0|
|2001-01-02|1330|30|797|0|
|2001-01-03|1329|28|793|0|
|2001-01-04|1346|23|758|0|
|2001-01-05|1457|30|836|0|
|etc|etc|etc|etc|etc|


I double checked the database types and the ones with columns are INT but the collision_date is actually set as &quot;TEXT&quot;.

For Sh*t and giggles I did:

    SELECT CAST(collision_date AS DATE),COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
    SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
    FROM collisions c 
    GROUP BY collision_date;



|CAST(collision_date AS DATE)|COUNT(*)|SUM(severe_injury_count)|SUM(injured_victims)|SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured|
|-------------|---------|---------|---------|---------|
|2,001|1000|37|676|0|
|2,001|1330|30|797|0|
|2,001|1329|28|793|0|
|2,001|1346|23|758|0|
|2,001|1457|30|836|0|
|etc|etc|etc|etc|etc|



Also attempting to coerce NULL&#39;s into 0 as also suggested.

    SELECT collision_date ,COUNT(*),SUM(IFNULL(severe_injury_count,0)),SUM(IFNULL(injured_victims,0)),
    SUM(IFNULL(severe_injury_count,0))/SUM(IFNULL(injured_victims,0)) AS chance_being_sever_injured
    FROM collisions c 
    GROUP BY collision_date;

Outputs:

|collision_date|COUNT(*)|SUM(severe_injury_count)|SUM(injured_victims)|SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured|
|-------------|---------|---------|---------|---------|
|2001-01-01|1000|37|676|0|
|2001-01-02|1330|30|797|0|
|2001-01-03|1329|28|793|0|
|2001-01-04|1346|23|758|0|
|2001-01-05|1457|30|836|0|
|etc|etc|etc|etc|etc|


I&#39;m truly baffled...


------ EDIT ----

When I initially composed the question I was under the impression sqlite functions were compatible with MySQl. Hence the confusion of why my attempts give &quot;strange&quot; results.

Once I figured the functions don&#39;t work the same way, then using the correct syntax got the desired outcome.

</details>


# 答案1
**得分**: 1

MySQL 和 SQLite 绝对不是一回事!我已经更新了你问题的标签。

> 整数除法产生一个整数结果,朝向零截断。[文档](https://www.sqlite.org/lang_expr.html#:~:text=Integer%20divide%20yields%20an%20integer%20result%2C%20truncated%20toward%20zero.)

在 SQLite 中,你需要将值转换为 `REAL` 或 `FLOAT` 才能进行除法运算:

```sql
SELECT
    collision_date,
    SUM(severe_injury_count),
    SUM(injured_victims),
    ROUND(CAST(SUM(severe_injury_count) AS REAL) / CAST(SUM(injured_victims) AS REAL) * 100, 2)
FROM collisions
GROUP BY collision_date

你测试中观察到的 NULL 值是由于除以 0(零)引起的。

英文:

MySQL and SQLite are definitely not the same thing! I have updated the tag on your question.

> Integer divide yields an integer result, truncated toward zero. docs

You need to cast to REAL or FLOAT for the division to work on SQLite:

SELECT
    collision_date,
    SUM(severe_injury_count),
    SUM(injured_victims),
    ROUND(CAST(SUM(severe_injury_count) AS REAL) / CAST(SUM(injured_victims) AS REAL) * 100, 2)
FROM collisions
GROUP BY collision_date

The NULLS observed in one of your tests were the result of division by 0 (zero).

huangapple
  • 本文由 发表于 2023年2月10日 04:09:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75403927.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定