英文:
SQL: Learning How to Use The Percentile Functions in SQL
问题
I understand that you want a translation of the code and problem statement into Chinese. Here's the translation:
我正在使用 **Netezza SQL**。
我有以下表格:
CREATE TABLE MY_TABLE (
country VARCHAR(50),
gender CHAR(1),
age INTEGER,
height FLOAT,
owns_bicycle VARCHAR(3)
);
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 25, 175.99, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 63, 163.65, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 83, 166.01, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 94, 178.92, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 63, 173.24, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 87, 156.27, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 89, 159.26, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 61, 179.14, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 49, 167.55, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 53, 172.82, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 62, 161.18, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 31, 173.08, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 33, 166.13, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 50, 177.42, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 57, 155.56, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'M', 57, 158.47, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 27, 152.85, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 89, 156.31, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 99, 179.31, 'No');
**我的问题:**
- 首先,我想按身高值将身高分为3个相等大小的组(例如0%-33%,33%-66%,66%-99%)。
- 接下来,我想按年龄值将年龄分为5个相等大小的组(例如0%-20%,20%-40%等)。
- 然后,对于每个独特的国家、性别、年龄组和身高组的组合,我想找出拥有自行车的人的比例。
最终结果应该如下所示(即每个人只能分配到一个最终子集):
country gender height_group age_group bicycle_proportion counts
<chr> <chr> <fct> <fct> <dbl> <int>
1 Canada F 150.84 - 158.49 18 - 31 0 2
2 Canada F 150.84 - 158.49 31 - 45.2 0.333 3
3 Canada F 150.84 - 158.49 62.4 - 78.4 0 2
4 Canada F 150.84 - 158.49 78.4 - 99 0 1
5 Canada F 158.49 - 169.33 18 - 31 0 1
6 Canada F 158.49 - 169.33 31 - 45.2 1 1
**我知道如何使用 R 编程语言来做到这一点:**
library(dplyr)
set.seed(123)
n <- 100
country <- sample(c("USA", "Canada", "UK"), n, replace = TRUE)
gender <- sample(c("M", "F"), n, replace = TRUE)
age <- sample(18:100, n, replace = TRUE)
height <- runif(n, min = 150, max = 180)
owns_bicycle <- sample(c("Yes", "No"), n, replace = TRUE)
df <- data.frame(country, gender, age, height, owns_bicycle)
height_breaks <- quantile(df$height, probs = seq(0, 1, by = 1/3))
age_breaks <- quantile(df$age, probs = seq(0, 1, by = 1/5))
height_breaks <- round(height_breaks, 2)
height_labels <- paste0(height_breaks[-length(height_breaks)], " - ", height_breaks[-1])
age_labels <- paste0(age_breaks[-length(age_breaks)], " - ", age_breaks[-1])
df$height_group <- cut(df$height, breaks = height_breaks, labels = height_labels, include.lowest = TRUE)
df$age_group <- cut(df$age, breaks = age_breaks, labels = age_labels, include.lowest = TRUE)
final = df %>%
group_by(country, gender, height_group, age_group) %>%
summarise(bicycle_proportion = mean(owns_bicycle == "Yes"),
counts = n())
**现在,我正在尝试将其转换
<details>
<summary>英文:</summary>
I am working with **Netezza SQL.**
I have the following table:
CREATE TABLE MY_TABLE (
country VARCHAR(50),
gender CHAR(1),
age INTEGER,
height FLOAT,
owns_bicycle VARCHAR(3)
);
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 25, 175.99, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 63, 163.65, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 83, 166.01, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 94, 178.92, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 63, 173.24, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 87, 156.27, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 89, 159.26, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 61, 179.14, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 49, 167.55, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 53, 172.82, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 62, 161.18, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 31, 173.08, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 33, 166.13, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 50, 177.42, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 57, 155.56, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'M', 57, 158.47, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 27, 152.85, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 89, 156.31, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 99, 179.31, 'No');
**My Problem:**
- First, I want to break height into 3 equal sized groups by value of their height (e.g. 0%-33%, 33%-66%,66%-99%).
- Next, I want to break age into 5 equal sized groups by value of their age (e.g. 0%-20%, 20%-40%, etc.)
- Then, for each unique combination of country, gender, age_group and height_group, I want to find out the percent of who own a bicycle.
The final result should look something like this (i.e. each person should only be assigned to one final subset):
country gender height_group age_group bicycle_proportion counts
<chr> <chr> <fct> <fct> <dbl> <int>
1 Canada F 150.84 - 158.49 18 - 31 0 2
2 Canada F 150.84 - 158.49 31 - 45.2 0.333 3
3 Canada F 150.84 - 158.49 62.4 - 78.4 0 2
4 Canada F 150.84 - 158.49 78.4 - 99 0 1
5 Canada F 158.49 - 169.33 18 - 31 0 1
6 Canada F 158.49 - 169.33 31 - 45.2 1 1
**I know how to do this using the R programming language:**
library(dplyr)
set.seed(123)
n <- 100
country <- sample(c("USA", "Canada", "UK"), n, replace = TRUE)
gender <- sample(c("M", "F"), n, replace = TRUE)
age <- sample(18:100, n, replace = TRUE)
height <- runif(n, min = 150, max = 180)
owns_bicycle <- sample(c("Yes", "No"), n, replace = TRUE)
df <- data.frame(country, gender, age, height, owns_bicycle)
height_breaks <- quantile(df$height, probs = seq(0, 1, by = 1/3))
age_breaks <- quantile(df$age, probs = seq(0, 1, by = 1/5))
height_breaks <- round(height_breaks, 2)
height_labels <- paste0(height_breaks[-length(height_breaks)], " - ", height_breaks[-1])
age_labels <- paste0(age_breaks[-length(age_breaks)], " - ", age_breaks[-1])
df$height_group <- cut(df$height, breaks = height_breaks, labels = height_labels, include.lowest = TRUE)
df$age_group <- cut(df$age, breaks = age_breaks, labels = age_labels, include.lowest = TRUE)
final = df %>%
group_by(country, gender, height_group, age_group) %>%
summarise(bicycle_proportion = mean(owns_bicycle == "Yes"),
counts = n())
**Now, I am trying to convert this into Netezza SQL.**
I am not sure how to do this:
CREATE TABLE height_groups AS
SELECT
NTILE(3) OVER (ORDER BY height) AS height_group,
MIN(height) AS min_height,
MAX(height) AS max_height
FROM MY_TABLE;
CREATE TABLE age_groups AS
SELECT
NTILE(5) OVER (ORDER BY age) AS age_group,
MIN(age) AS min_age,
MAX(age) AS max_age
FROM MY_TABLE;
**But I don't think I am doing this correctly.
Can someone please show me how to do this?**
Thanks!
</details>
# 答案1
**得分**: 2
这是一段代码示例,用于实现与您的R代码等效的功能。它使用`percentile_cont`来计算分位数之间的两个边界值,然后可以在条件表达式中使用这些值来确定身高或年龄标签。"交叉连接"只是将这些值添加到每个表行,以便在条件表达式中更容易使用。(这在下面引用的fiddle中进一步显示。)
SELECT
t.country
, t.gender
, CASE WHEN t.height <= height_quantiles.q1 THEN 'short'
WHEN t.height <= height_quantiles.q2 THEN 'medium'
ELSE 'tall'
END AS height_group
, CASE WHEN t.age <= age_quantiles.q1 THEN 'young'
WHEN t.age <= age_quantiles.q2 THEN 'middle_age'
ELSE 'old'
END AS age_group
, AVG(CASE WHEN t.owns_bicycle = 'Yes' THEN 1 ELSE 0 END) AS bicycle_proportion
, COUNT(*) AS counts
FROM MY_TABLE t
CROSS JOIN (
SELECT
PERCENTILE_CONT(0.33) WITHIN GROUP (ORDER BY height) AS q1
, PERCENTILE_CONT(0.67) WITHIN GROUP (ORDER BY height) AS q2
FROM MY_TABLE t
) height_quantiles
CROSS JOIN (
SELECT
PERCENTILE_CONT(0.2) WITHIN GROUP (ORDER BY age) AS q1
, PERCENTILE_CONT(0.4) WITHIN GROUP (ORDER BY age) AS q2
FROM MY_TABLE t
) age_quantiles
GROUP BY
t.country
, t.gender
, height_group
, age_group
ORDER BY
t.country
, t.gender
, height_group
, age_group
这是一个SQL查询,用于根据身高和年龄计算分组,并且还包括其他统计信息。如果您需要更多帮助或有其他问题,请告诉我。
<details>
<summary>英文:</summary>
Here is an example use of some functions to arrive at the equivalent of your R code. It uses `percentile_cont` to arrive at the 2 boundary values between quantiles which can then be used in case expressions to arrive at height or age labels. The "cross join" simply adds these values to each table row for ease of use in the case expressions. (This is displayed further in the referenced fiddle below.)
SELECT
t.country
, t.gender
, CASE WHEN t.height <= height_quantiles.q1 THEN 'short'
WHEN t.height <= height_quantiles.q2 THEN 'medium'
ELSE 'tall'
END AS height_group
, CASE WHEN t.age <= age_quantiles.q1 THEN 'young'
WHEN t.age <= age_quantiles.q2 THEN 'middle_age'
ELSE 'old'
END AS age_group
, AVG(CASE WHEN t.owns_bicycle = 'Yes' THEN 1 ELSE 0 END) AS bicycle_proportion
, COUNT(*) AS counts
FROM MY_TABLE t
CROSS JOIN (
SELECT
PERCENTILE_CONT(0.33) WITHIN GROUP (ORDER BY height) AS q1
, PERCENTILE_CONT(0.67) WITHIN GROUP (ORDER BY height) AS q2
FROM MY_TABLE t
) height_quantiles
CROSS JOIN (
SELECT
PERCENTILE_CONT(0.2) WITHIN GROUP (ORDER BY age) AS q1
, PERCENTILE_CONT(0.4) WITHIN GROUP (ORDER BY age) AS q2
FROM MY_TABLE t
) age_quantiles
GROUP BY
t.country
, t.gender
, height_group
, age_group
ORDER BY
t.country
, t.gender
, height_group
, age_group
| country | gender | height\_group | age\_group | bicycle\_proportion | counts |
|:--------|:-------|:-------------|:----------|-------------------:|-------:|
| Canada | F | medium | old | 0.00000000000000000000 | 1 |
| Canada | F | medium | young | 0.00000000000000000000 | 1 |
| Canada | F | short | old | 0.00000000000000000000 | 1 |
| Canada | F | tall | middle\_age | 0.00000000000000000000 | 1 |
| Canada | M | short | old | 1.00000000000000000000 | 1 |
| Canada | M | tall | old | 1.00000000000000000000 | 2 |
| UK | F | medium | middle\_age | 0.00000000000000000000 | 1 |
| UK | F | medium | old | 0.50000000000000000000 | 2 |
| UK | F | short | middle\_age | 0.00000000000000000000 | 1 |
| UK | F | short | old | 1.00000000000000000000 | 1 |
| UK | F | short | young | 0.00000000000000000000 | 1 |
| UK | M | tall | old | 1.00000000000000000000 | 1 |
| UK | M | tall | young | 0.00000000000000000000 | 1 |
| USA | F | medium | middle\_age | 0.00000000000000000000 | 1 |
| USA | F | medium | young | 1.00000000000000000000 | 1 |
| USA | F | tall | old | 0.00000000000000000000 | 1 |
| USA | M | short | middle\_age | 1.00000000000000000000 | 1 |
[fiddle](https://dbfiddle.uk/zimobgtF)
Re: An "adaptable" variant. SQL is not a programming language, it deals with sets of data, so mimicking the R code may be possible if you use a "procedural extension" to SQL (e.g. plsql in Postgres) but as I'm not able to run anything in Netazza the best I can suggest is something along these lines.
with CTE as (
SELECT
*
, NTILE(4) OVER (ORDER BY height) AS height_quantile
, NTILE(5) OVER (ORDER BY age) AS age_quantile
FROM my_table
)
, height_labels as (
select
height_quantile hq
, concat(floor(min(round(height::decimal,2))) , ' to '
, max(ceiling(round(height::decimal,2)) ) ) height_label
from CTE
group by
height_quantile
)
, age_labels as (
select
age_quantile aq
, concat(min(age ) , ' to '
, max(age ) ) age_label
from CTE
group by
age_quantile
)
select
*
from CTE
inner join height_labels h on cte.height_quantile=h.hq
inner join age_labels a on cte.age_quantile=a.aq
order by
height, age
| country | gender | age | height | owns\_bicycle | height\_quantile | age\_quantile | hq | height\_label | aq | age\_label |
|:--------|:-------|----:|:-------|:-------------|----------------:|-------------:|---:|:-------------|---:|:----------|
| UK | F | 27 | 152.85 | No | 1 | 1 | 1 | 152 to 159 | 1 | 25 to 33 |
| UK | F | 57 | 155.56 | No | 1 | 3 | 1 | 152 to 159 | 3 | 57 to 63 |
| Canada | M | 87 | 156.27 | Yes | 1 | 4 | 1 | 152 to 159 | 4 | 63 to 89 |
| UK | F | 89 | 156.31 | Yes | 1 | 5 | 1 | 152 to 159 | 5 | 89 to 99 |
| USA | M | 57 | 158.47 | Yes | 1 | 2 | 1 | 152 to 159 | 2 | 49 to 57 |
| Canada | F | 89 | 159.26 | No | 2 | 4 | 2 | 159 to 167 | 4 | 63 to 89 |
| Canada | F | 62 | 161.18 | No | 2 | 3 | 2 | 159 to 167 | 3 | 57 to 63 |
| UK | F | 63 | 163.65 | No | 2 | 3 | 2 | 159 to 167 | 3 | 57 to 63 |
| UK | F | 83 | 166.01 | Yes | 2 | 4 | 2 | 159 to 167 | 4 | 63 to 89 |
| USA | F | 33 | 166.13 | Yes | 2 | 1 | 2 | 159 to 167 | 1 | 25 to 33 |
| UK | F | 49 | 167.55 | No | 3 | 2 | 3 | 167 to 176 | 2 | 49 to 57 |
| USA | F | 53 | 172.82 | No | 3 | 2 | 3 | 167 to 176 | 2 | 49 to 57 |
| Canada | F | 31 | 173.08 | No | 3 | 1 | 3 | 167 to 176 | 1 | 25 to 33 |
| UK | M | 63 | 173.24 | Yes | 3 | 4 | 3 | 167 to 176 | 4 | 63 to 89 |
| UK | M | 25 | 175.99 | No | 3 | 1 | 3 | 167 to 176 | 1 | 25 to 33 |
| Canada | F | 50 | 177.42 | No | 4 | 2 | 4 | 177 to 180 | 2 | 49 to 57 |
| Canada | M | 94 | 178.92 | Yes | 4 | 5 | 4 | 177 to 180 | 5 | 89 to 99 |
| Canada | M | 61 | 179.14 | Yes | 4 | 3 | 4 | 177 to 180 | 3 | 57 to 63 |
| USA | F | 99 | 179.31 | No | 4 | 5 | 4 | 177 to 180 | 5 | 89 to 99 |
[The third fiddle](https://dbfiddle.uk/lnk4XzdJ)
In this approach the labels are generated from the min/max pairs of each NTILE, and then those used in the final output. To simplify the label creation I introduced the functions floor and ceiling.
</details>
# 答案2
**得分**: 1
以下是您要翻译的内容:
这是一个简单的示例
创建表格学生(
ID INT,
名字 VARCHAR(255),
标记 INT
);
插入学生值
(1,'约翰',90),
(2,'简',80),
(3,'迈克',70),
(4,'彼得',60);
选择
ID,
名字,
标记,
PERCENTILE_CONT(标记,0.25)作为第25百分位数,
PERCENTILE_CONT(标记,0.50)作为第50百分位数,
PERCENTILE_CONT(标记,0.75)作为第75百分位数
从学生;
<details>
<summary>英文:</summary>
Heres a simple example
CREATE TABLE students (
id INT,
name VARCHAR(255),
marks INT
);
INSERT INTO students VALUES
(1, 'John', 90),
(2, 'Jane', 80),
(3, 'Mike', 70),
(4, 'Peter', 60);
SELECT
id,
name,
marks,
PERCENTILE_CONT(marks, 0.25) AS 25th_percentile,
PERCENTILE_CONT(marks, 0.50) AS 50th_percentile,
PERCENTILE_CONT(marks, 0.75) AS 75th_percentile
FROM students;
</details>
# 答案3
**得分**: 0
以下是翻译好的部分:
OP在这里 - 这是我自己尝试解决自己问题的方法。
我认为可以分为三个步骤来完成:
- 步骤1:首先,根据身高和年龄计算必要的百分位类别(例如,3组,5组)
- 步骤2:接下来,将原始表中的身高和年龄的值替换为这些类别
- 步骤3:最后,创建一个包含每个类别的最小值/最大值的图例/查找表
我不确定这是否正确(尽管看起来是正确的?)。
**能有人提供对此的意见吗?**
创建了一个名为MY_TABLE的表,其中包括以下列:
- 国家(country)VARCHAR(50)
- 性别(gender)CHAR(1)
- 年龄(age)INTEGER
- 身高(height)FLOAT
- 拥有自行车(owns_bicycle)VARCHAR(3)
插入了一些数据行到MY_TABLE表中。
创建了一个名为height_groups的表,其中包括身高分组信息,如下:
- 国家(country)
- 性别(gender)
- 身高分组(height_group)
- 最小身高(min_height)
- 最大身高(max_height)
创建了一个名为age_groups的表,其中包括年龄分组信息,如下:
- 国家(country)
- 性别(gender)
- 年龄分组(age_group)
- 最小年龄(min_age)
- 最大年龄(max_age)
创建了一个名为final的表,其中包括了以下信息:
- 国家(country)
- 性别(gender)
- 身高分组(height_group)
- 年龄分组(age_group)
- 拥有自行车的比例(bicycle_proportion)
- 计数(counts)
这个表的数据是从height_groups、age_groups和MY_TABLE表中联接而来的,条件是国家、性别、身高和年龄都匹配,并且年龄在指定范围内。bicycle_proportion列计算了拥有自行车的比例。
最后,创建了一个名为lookup的表,其中包括了以下信息:
- 国家(country)
- 性别(gender)
- 身高分组(height_group)
- 年龄分组(age_group)
- 最小身高(min_height)
- 最大身高(max_height)
- 最小年龄(min_age)
- 最大年龄(max_age)
以上是对你的问题的翻译,不包含代码部分。
<details>
<summary>英文:</summary>
OP here - here is my own attempt at solving my own problem.
I thought I can do this in 3 steps:
- Step 1: First, calculate the necessary percentile categories (e.g. 3 groups, 5 groups) based on height and age
- Step 2: Next, replace the values of height and age in the original table with these categories
- Step 3: Finally, make a legend/lookup table that contains the min/max of each category
I am not sure if this is correct (although it looks correct?).
**Can someone please provide opinions on this?**
CREATE TABLE MY_TABLE (
country VARCHAR(50),
gender CHAR(1),
age INTEGER,
height FLOAT,
owns_bicycle VARCHAR(3)
);
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 25, 175.99, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 63, 163.65, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 83, 166.01, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 94, 178.92, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 63, 173.24, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 87, 156.27, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 89, 159.26, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 61, 179.14, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 49, 167.55, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 53, 172.82, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 62, 161.18, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 31, 173.08, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 33, 166.13, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 50, 177.42, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 57, 155.56, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'M', 57, 158.47, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 27, 152.85, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 89, 156.31, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 99, 179.31, 'No');
CREATE TABLE height_groups AS
SELECT
country,
gender,
NTILE(3) OVER (ORDER BY height) AS height_group,
MIN(height) AS min_height,
MAX(height) AS max_height
FROM MY_TABLE
GROUP BY country, gender, height;
CREATE TABLE age_groups AS
SELECT
country,
gender,
NTILE(5) OVER (ORDER BY age) AS age_group,
MIN(age) AS min_age,
MAX(age) AS max_age
FROM MY_TABLE
GROUP BY country, gender, age;
CREATE TABLE final AS
SELECT
h.country,
h.gender,
h.height_group,
a.age_group,
AVG(CASE WHEN t.owns_bicycle = 'Yes' THEN 1 ELSE 0 END) AS bicycle_proportion,
COUNT(*) AS counts
FROM height_groups h
JOIN age_groups a ON h.country = a.country AND h.gender = a.gender
JOIN MY_TABLE t ON h.country = t.country AND h.gender = t.gender AND t.height BETWEEN h.min_height AND h.max_height AND t.age BETWEEN a.min_age AND a.max_age
GROUP BY h.country, h.gender, h.height_group, a.age_group;
CREATE TABLE lookup AS
SELECT
h.country,
h.gender,
h.height_group,
a.age_group,
h.min_height,
h.max_height,
a.min_age,
a.max_age
FROM height_groups h
JOIN age_groups a ON h.country = a.country AND h.gender = a.gender;
**Is what I have done correct?**
</details>
# 答案4
**得分**: 0
这是你的代码的翻译:
```sql
OP 再次提问 - 我意识到我的先前答案中年龄组和身高组不一致(例如,有时age_group = 1的最小/最大身高不同)。
以下是我的第二次尝试 - 更加手动的尝试(这次是为5个年龄组和5个身高组):
CREATE TABLE MY_TABLE (
country VARCHAR(50),
gender CHAR(1),
age INTEGER,
height FLOAT,
owns_bicycle VARCHAR(3)
);
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 25, 175.99, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 63, 163.65, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 83, 166.01, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 94, 178.92, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 63, 173.24, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 87, 156.27, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 89, 159.26, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 61, 179.14, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 49, 167.55, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 53, 172.82, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 62, 161.18, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 31, 173.08, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 33, 166.13, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 50, 177.42, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 57, 155.56, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'M', 57, 158.47, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 27, 152.85, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 89, 156.31, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 99, 179.31, 'No');
CREATE TABLE age_groups AS
SELECT country, gender, age, height, owns_bicycle,
NTILE(5) OVER (ORDER BY age) AS age_group
FROM MY_TABLE;
CREATE TABLE height_groups AS
SELECT country, gender, age, height, owns_bicycle,
NTILE(5) OVER (ORDER BY height) AS height_group
FROM MY_TABLE;
CREATE TABLE age_height_group_replacements AS
SELECT
a.country,
a.gender,
a.age,
a.height,
a.owns_bicycle,
CASE
WHEN a.age BETWEEN (SELECT MIN(age) FROM age_groups WHERE age_group = 1) AND (SELECT MAX(age) FROM age_groups WHERE age_group = 1) THEN 'Group 1'
WHEN a.age BETWEEN (SELECT MIN(age) FROM age_groups WHERE age_group = 2) AND (SELECT MAX(age) FROM age_groups WHERE age_group = 2) THEN 'Group 2'
WHEN a.age BETWEEN (SELECT MIN(age) FROM age_groups WHERE age_group = 3) AND (SELECT MAX(age) FROM age_groups WHERE age_group = 3) THEN 'Group 3'
WHEN a.age BETWEEN (SELECT MIN(age) FROM age_groups WHERE age_group = 4) AND (SELECT MAX(age) FROM age_groups WHERE age_group = 4) THEN 'Group 4'
ELSE 'Group 5'
END AS age_group_replacement,
CASE
WHEN a.height BETWEEN (SELECT MIN(height) FROM height_groups WHERE height_group = 1) AND (SELECT MAX(height) FROM height_groups WHERE height_group = 1) THEN 'Group 1'
WHEN a.height BETWEEN (SELECT MIN(height) FROM height_groups WHERE height_group = 2) AND (SELECT MAX(height) FROM height_groups WHERE height_group = 2) THEN 'Group 2'
WHEN a.height BETWEEN (SELECT MIN(height) FROM height_groups WHERE height_group = 3) AND (SELECT MAX(height) FROM height_groups WHERE height_group = 3) THEN 'Group 3'
WHEN a.height BETWEEN (SELECT MIN(height) FROM height_groups WHERE height_group = 4) AND (SELECT MAX(height) FROM height_groups WHERE height_group = 4) THEN 'Group 4'
ELSE 'Group 5'
END AS height_group_replacement,
(SELECT MIN(age) FROM age_groups WHERE age_group = (SELECT age_group FROM age_groups WHERE age = a.age)) AS min_age,
(SELECT MAX(age) FROM age_groups WHERE age_group = (SELECT age_group FROM age_groups WHERE age = a.age)) AS max_age,
(SELECT MIN(height) FROM height_groups WHERE height_group = (SELECT height_group FROM height_groups WHERE height = a.height)) AS min_height,
(SELECT MAX(height) FROM height_groups WHERE height_group = (SELECT height_group FROM height_groups WHERE height = a.height)) AS max_height
FROM
MY_TABLE a;
CREATE TABLE BIKE_OWNERSHIP AS SELECT
min_age,
max_age,
min_height,
max_height,
country,
gender,
COUNT(CASE WHEN owns_bicycle = 'Yes' THEN 1 END) * 100.0 / COUNT(*) AS percentage_owns_bicycle
FROM
age_height_group_replacements
GROUP BY
min_age,
max_age,
min_height,
max_height,
country,
gender;
有关此代码的任何评论吗?
谢谢!
请注意,这
英文:
OP again - I realized the age groups and height groups are not consistent in my previous answer (e.g. sometimes age_group = 1 will have different min/max heights).
Here is my second attempt - a more manual attempt (this time for 5 age groups and 5 height groups):
CREATE TABLE MY_TABLE (
country VARCHAR(50),
gender CHAR(1),
age INTEGER,
height FLOAT,
owns_bicycle VARCHAR(3)
);
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 25, 175.99, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 63, 163.65, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 83, 166.01, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 94, 178.92, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'M', 63, 173.24, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 87, 156.27, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 89, 159.26, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'M', 61, 179.14, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 49, 167.55, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 53, 172.82, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 62, 161.18, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 31, 173.08, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 33, 166.13, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('Canada', 'F', 50, 177.42, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 57, 155.56, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'M', 57, 158.47, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 27, 152.85, 'No');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('UK', 'F', 89, 156.31, 'Yes');
INSERT INTO MY_TABLE (country, gender, age, height, owns_bicycle)
VALUES ('USA', 'F', 99, 179.31, 'No');
CREATE TABLE age_groups AS
SELECT country, gender, age, height, owns_bicycle,
NTILE(5) OVER (ORDER BY age) AS age_group
FROM MY_TABLE;
CREATE TABLE height_groups AS
SELECT country, gender, age, height, owns_bicycle,
NTILE(5) OVER (ORDER BY height) AS height_group
FROM MY_TABLE;
CREATE TABLE age_height_group_replacements AS
SELECT
a.country,
a.gender,
a.age,
a.height,
a.owns_bicycle,
CASE
WHEN a.age BETWEEN (SELECT MIN(age) FROM age_groups WHERE age_group = 1) AND (SELECT MAX(age) FROM age_groups WHERE age_group = 1) THEN 'Group 1'
WHEN a.age BETWEEN (SELECT MIN(age) FROM age_groups WHERE age_group = 2) AND (SELECT MAX(age) FROM age_groups WHERE age_group = 2) THEN 'Group 2'
WHEN a.age BETWEEN (SELECT MIN(age) FROM age_groups WHERE age_group = 3) AND (SELECT MAX(age) FROM age_groups WHERE age_group = 3) THEN 'Group 3'
WHEN a.age BETWEEN (SELECT MIN(age) FROM age_groups WHERE age_group = 4) AND (SELECT MAX(age) FROM age_groups WHERE age_group = 4) THEN 'Group 4'
ELSE 'Group 5'
END AS age_group_replacement,
CASE
WHEN a.height BETWEEN (SELECT MIN(height) FROM height_groups WHERE height_group = 1) AND (SELECT MAX(height) FROM height_groups WHERE height_group = 1) THEN 'Group 1'
WHEN a.height BETWEEN (SELECT MIN(height) FROM height_groups WHERE height_group = 2) AND (SELECT MAX(height) FROM height_groups WHERE height_group = 2) THEN 'Group 2'
WHEN a.height BETWEEN (SELECT MIN(height) FROM height_groups WHERE height_group = 3) AND (SELECT MAX(height) FROM height_groups WHERE height_group = 3) THEN 'Group 3'
WHEN a.height BETWEEN (SELECT MIN(height) FROM height_groups WHERE height_group = 4) AND (SELECT MAX(height) FROM height_groups WHERE height_group = 4) THEN 'Group 4'
ELSE 'Group 5'
END AS height_group_replacement,
(SELECT MIN(age) FROM age_groups WHERE age_group = (SELECT age_group FROM age_groups WHERE age = a.age)) AS min_age,
(SELECT MAX(age) FROM age_groups WHERE age_group = (SELECT age_group FROM age_groups WHERE age = a.age)) AS max_age,
(SELECT MIN(height) FROM height_groups WHERE height_group = (SELECT height_group FROM height_groups WHERE height = a.height)) AS min_height,
(SELECT MAX(height) FROM height_groups WHERE height_group = (SELECT height_group FROM height_groups WHERE height = a.height)) AS max_height
FROM
MY_TABLE a;
CREATE TABLE BIKE_OWNERSHIP AS SELECT
min_age,
max_age,
min_height,
max_height,
country,
gender,
COUNT(CASE WHEN owns_bicycle = 'Yes' THEN 1 END) * 100.0 / COUNT(*) AS percentage_owns_bicycle
FROM
age_height_group_replacements
GROUP BY
min_age,
max_age,
min_height,
max_height,
country,
gender;
Anyone have any comments on this?
Thanks!
Note: you can try to execute this SQL code here: https://sqliteonline.com/
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论