英文:
Percentage & GROUP BY
问题
我目前正在使用一个碰撞数据集,该数据集提供了在给定日期发生的所有情况。该数据集是一个SQLite数据库。
我的第一直觉是获取给定日期的总数,输出看起来像这样:
collision_date | SUM(severe_injury_count) | SUM(injured_victims) |
---|---|---|
2001-02-20 | 19 | 785 |
2001-02-20 | 12 | 697 |
2001-02-20 | 28 | 823 |
2001-02-20 | 29 | 871 |
上面的示例是以下查询的输出:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims)
FROM collisions c
GROUP BY collision_date
LIMIT 50,100;
我想计算严重受伤次数/受伤受害者次数的百分比,我认为这应该很简单,因此我尝试运行以下查询(一旦注意到它没有给我我打算的结果,我尝试了一些不同的方法来计算百分比):
SELECT
collision_date,
SUM(severe_injury_count/injured_victims) AS chance_being_sever_injured,
SUM(severe_injury_count),
SUM(injured_victims),
(severe_injury_count/injured_victims)*100,
(SUM(severe_injury_count)/SUM(injured_victims))*100
FROM collisions c
GROUP BY collision_date;
但是我得到的输出并没有像我预期的那样进行计算,给我了这样的结果:
collision_date | chance_being_sever_injured | SUM(severe_injury_count) | SUM(injured_victims) | (severe_injury_count/injured_victims)*100 | (SUM(severe_injury_count)/SUM(injured_victims))*100 |
---|---|---|---|---|---|
2001-02-20 | 13 | 19 | 785 | NULL | 0 |
2001-02-20 | 5 | 12 | 697 | NULL | 0 |
2001-02-20 | 17 | 28 | 823 | 0 | 0 |
2001-02-20 | 18 | 29 | 871 | NULL | 0 |
我检查了变量类型,它们都是整数而不是字符串,所以我本来期望实际的百分比被计算。
根据输出结果,我认为在执行这种操作时,我可能遗漏了一些基本的东西。
我还尝试使用FORMAT()
,但输出也都是零:
FORMAT((SUM(severe_injury_count)/SUM(injured_victims))*100,2)
如果有任何见解,将不胜感激。
谢谢您的时间和反馈。
【实施建议,因此扩展初始帖子:】
我也尝试了以下操作:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c
GROUP BY collision_date
LIMIT 50,100;
还尝试通过以下方式排除可能的NULL:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c WHERE severe_injury_count IS NOT NULL OR injured_victims IS NOT NULL
GROUP BY collision_date
LIMIT 50,100;
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c WHERE severe_injury_count > 0 OR injured_victims > 0
GROUP BY collision_date
LIMIT 50,100;
上述所有替代方案都为我提供了“百分比”列的值为0。
还尝试按照@easleyfixed的建议,将给定列的类型强制转换:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(CAST(severe_injury_count AS INT))/SUM(CAST(injured_victims AS INT)) AS DECIMAL)
FROM collisions c WHERE severe_injury_count > 0 OR injured_victims > 0
GROUP BY collision_date;
【扩展@nnichols和@easleyfixed的建议:】
为了更好地说明数据,运行以下查询:
SELECT collision_date,COUNT(*)
FROM collisions c
GROUP BY collision_date;
给我提供了(表示给定日期的记录数):
collision_date | COUNT(*) |
---|---|
2001-01-01 | 1000 |
2001-01-02 | 1330 |
2001-01-03 | 1329 |
2001-01-04 | 1346 |
2001-01-05 | 1457 |
等等 | 等等 |
因此,我扩展了查询,以尝试包括我试图评估的内容:
SELECT collision_date,COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
FROM collisions c
GROUP BY collision_date;
输出如下:
collision_date | COUNT(*) | SUM(severe_injury_count) | SUM(injured_victims) | SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured |
---|---|---|---|---|
2001-01-01 | 1000 | 37 | 676 | 0 |
2001-01-02 | 1330 | 30 | 797 | 0 |
2001-01-03 | 1329 | 28 | 793 | 0 |
2001-01-04 | 1346 | 23 | 758 | 0 |
2001-01-05 | 1457 | 30 | 836 | 0 |
等等 | 等等 | 等等 | 等等 | 等等 |
我仔细检查了数据库类型,具有列的类型都是INT,但collision_date实际上设置为“TEXT”。
为了好玩,我尝试了以下操作:
SELECT CAST(collision_date AS DATE),COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
FROM collisions c
GROUP
<details>
<summary>英文:</summary>
I'm currently working with a collisions dataset which provides all cases that occur in a given day. The dataset is a sqlite database.
My first instinct was to get the totals for a given day, where the output looked something like:
|collision_date|SUM(severe_injury_count)|SUM(injured_victims)|
|-------------|----------|--------|
|2001-02-20|19|785|
|2001-02-20|12|697|
|2001-02-20|28|823|
|2001-02-20|29|871|
The above example is the output of the below query:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims)
FROM collisions c
GROUP BY collision_date
LIMIT 50,100;
I wanted to calculate a percentage of severe_injury_count/injured_victims, I thought it would be straightforward, therefore I attempted running this query (with a few variations of how I might have calc. the % - once I noticed it wasn't giving me what I intended):
SELECT
collision_date,
SUM(severe_injury_count/injured_victims) AS chance_being_sever_injured,
SUM(severe_injury_count),
SUM(injured_victims),
(severe_injury_count/injured_victims)*100,
(SUM(severe_injury_count)/SUM(injured_victims))*100
FROM collisions c
GROUP BY collision_date;
But the output I've been given does do the calculation as I might have expected, giving me results like:
|collision_date|chance_being_sever_injured|SUM(severe_injury_count)|SUM(injured_victims)|(severe_injury_count/injured_victims)*100|(SUM(severe_injury_count)/SUM(injured_victims))*100|
|-----------|-----------|-----------|-----------|-----------|-----------|
|2001-02-20|13|19|785|NULL|0|
|2001-02-20|5|12|697|NULL|0|
|2001-02-20|17|28|823|0|0|
|2001-02-20|18|29|871|NULL|0|
I checked the variable types and they are all integers and not strings, so I would have expected to have the actual percentages calculated.
Given the output results, I believe that I'm missing something fundamental when doing this type of operation.
I also tried using FORMAT(), but the output were all zero's as well...
> FORMAT((SUM(severe_injury_count)/SUM(injured_victims))*100,2)
Any insight would be much appreciated.
Thank you for your time and feedback.
______________________________________
**Implementing suggestions, hence extending initial post:**
I tried the following as well:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c
GROUP BY collision_date
LIMIT 50,100;
Tried also to exclude possible NULL's by:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c WHERE severe_injury_count IS NOT NULL OR injured_victims IS NOT NULL
GROUP BY collision_date
LIMIT 50,100;
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(severe_injury_count)/SUM(injured_victims) AS DECIMAL)
FROM collisions c WHERE severe_injury_count > 0 OR injured_victims > 0
GROUP BY collision_date
LIMIT 50,100;
All the above alternatives give me 0 as values for the "percentage" column I'm trying to calculate.
Also attempted to coerce the type for a given column as suggested by @easleyfixed like so:
SELECT collision_date, SUM(severe_injury_count),SUM(injured_victims),CAST(SUM(CAST(severe_injury_count AS INT))/SUM(CAST(injured_victims AS INT)) AS DECIMAL)
FROM collisions c WHERE severe_injury_count > 0 OR injured_victims > 0
GROUP BY collision_date;
------------------------------------------
**Expanding on @nnichols & @easleyfixed suggestions**
To better illustrate the data, running:
SELECT collision_date,COUNT(*)
FROM collisions c
GROUP BY collision_date;
Gives me (represents the number of records for a given date):
|collision_date|COUNT(*)|
|-------------|---------|
|2001-01-01|1000|
|2001-01-02|1330|
|2001-01-03|1329|
|2001-01-04|1346|
|2001-01-05|1457|
|etc|etc|
I therefore expanded the query to try and include what I'm trying to assess.
SELECT collision_date,COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
FROM collisions c
GROUP BY collision_date;
Outputs:
|collision_date|COUNT(*)|SUM(severe_injury_count)|SUM(injured_victims)|SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured|
|-------------|---------|---------|---------|---------|
|2001-01-01|1000|37|676|0|
|2001-01-02|1330|30|797|0|
|2001-01-03|1329|28|793|0|
|2001-01-04|1346|23|758|0|
|2001-01-05|1457|30|836|0|
|etc|etc|etc|etc|etc|
I double checked the database types and the ones with columns are INT but the collision_date is actually set as "TEXT".
For Sh*t and giggles I did:
SELECT CAST(collision_date AS DATE),COUNT(*),SUM(severe_injury_count),SUM(injured_victims),
SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured
FROM collisions c
GROUP BY collision_date;
|CAST(collision_date AS DATE)|COUNT(*)|SUM(severe_injury_count)|SUM(injured_victims)|SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured|
|-------------|---------|---------|---------|---------|
|2,001|1000|37|676|0|
|2,001|1330|30|797|0|
|2,001|1329|28|793|0|
|2,001|1346|23|758|0|
|2,001|1457|30|836|0|
|etc|etc|etc|etc|etc|
Also attempting to coerce NULL's into 0 as also suggested.
SELECT collision_date ,COUNT(*),SUM(IFNULL(severe_injury_count,0)),SUM(IFNULL(injured_victims,0)),
SUM(IFNULL(severe_injury_count,0))/SUM(IFNULL(injured_victims,0)) AS chance_being_sever_injured
FROM collisions c
GROUP BY collision_date;
Outputs:
|collision_date|COUNT(*)|SUM(severe_injury_count)|SUM(injured_victims)|SUM(severe_injury_count)/SUM(injured_victims) AS chance_being_sever_injured|
|-------------|---------|---------|---------|---------|
|2001-01-01|1000|37|676|0|
|2001-01-02|1330|30|797|0|
|2001-01-03|1329|28|793|0|
|2001-01-04|1346|23|758|0|
|2001-01-05|1457|30|836|0|
|etc|etc|etc|etc|etc|
I'm truly baffled...
------ EDIT ----
When I initially composed the question I was under the impression sqlite functions were compatible with MySQl. Hence the confusion of why my attempts give "strange" results.
Once I figured the functions don't work the same way, then using the correct syntax got the desired outcome.
</details>
# 答案1
**得分**: 1
MySQL 和 SQLite 绝对不是一回事!我已经更新了你问题的标签。
> 整数除法产生一个整数结果,朝向零截断。[文档](https://www.sqlite.org/lang_expr.html#:~:text=Integer%20divide%20yields%20an%20integer%20result%2C%20truncated%20toward%20zero.)
在 SQLite 中,你需要将值转换为 `REAL` 或 `FLOAT` 才能进行除法运算:
```sql
SELECT
collision_date,
SUM(severe_injury_count),
SUM(injured_victims),
ROUND(CAST(SUM(severe_injury_count) AS REAL) / CAST(SUM(injured_victims) AS REAL) * 100, 2)
FROM collisions
GROUP BY collision_date
你测试中观察到的 NULL 值是由于除以 0(零)引起的。
英文:
MySQL and SQLite are definitely not the same thing! I have updated the tag on your question.
> Integer divide yields an integer result, truncated toward zero. docs
You need to cast to REAL
or FLOAT
for the division to work on SQLite:
SELECT
collision_date,
SUM(severe_injury_count),
SUM(injured_victims),
ROUND(CAST(SUM(severe_injury_count) AS REAL) / CAST(SUM(injured_victims) AS REAL) * 100, 2)
FROM collisions
GROUP BY collision_date
The NULLS observed in one of your tests were the result of division by 0 (zero).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论