英文:
How to get right proportions based on a subset of the data set in R
问题
我要计算数据集中关于plan
与school type
之间的比例。问题在于,我必须首先多次对数据集进行子集化。例如,在这种情况下,我需要对数据集进行子集化,以仅获取提供从level 0
到level 04
的学校。我该如何做?
- 编辑:
下面的方法有效,但我没有筛选出所有提供从level 0
到level 04
的学校,我得到的是有其中之一的学校。对此有什么想法将不胜感激。
- 数据描述:
# SCHOOL = 学校名称
# Q9 = 学校类型
# Q11 = 学校提供的级别
# Q40 = 与每个级别相关的学校提供的计划类型
注意:所有计划都与学校级别相关。因此,如果我不按每所学校提供的级别筛选,只计算这些列的比例,我的结果将具有误导性,因为每所学校可以提供不同的级别选项(所有级别、1级、2级等)。
我的尝试:
### D) 筛选只提供至Level 04的学校 ###
##################################### Quest40_2 ##########################
### 检查向量的名称:
unique(quest40_2$Q11)
unwanted_4_2 <- quest40_2 %>%
filter(Q11 %in% c('level05','level06')) ### 筛选不想要的学校级别
### 创建一个带有不想要的学校名称的向量:
filter_vec_4_2 <- unique(unwanted_4_2$SCHOOL) ### 获取唯一的名称(只有一所学校有一个名称)
### 将原始数据框赋给一个虚拟数据框:
filtered_df_4_2 <- quest40_2
### 循环遍历不想要的学校名称以删除它们:
for (i in 1 :length(filter_vec_4_2)) {
filtered_df_4_2 <- filtered_df_4_2[!filtered_df_4_2$SCHOOL == filter_vec_4_2[i],]
}
- 我需要计算每种
plan
按每种school type
出现的次数(并获得其比例)。问题是,每所学校可以有多个plan
类型。因此,如果我们要按每种school type
计算plan
类型的比例,我们不能按n来除
b <- filtered_df_4_2 %>% drop_na(Q40) %>%
count(Q40, Q9) %>%
group_by(Q9)
- 这使我想到了:
### 按每种'curriculo'计算唯一的学校数:
b2 <- b2 %>%
select(SCHOOL, Q9) %>%
unique() %>% count(Q9)
## 合并数据框并获取每种'planejamento'类型的学校比例
## 在课程类型内
c <- b %>% full_join(b2, by = 'Q9') %>%
mutate(prop = round((n.x/n.y *100), digits = 2)) %>%
select(-n.x, -n.y)
-
Q1 = 我认为我没有正确筛选,因为我不是专门筛选提供所有级别从01到04的学校。我想我在使用“或”而不是“与”。
-
Q2 = 有没有一种方法可以避免使用
tidyverse
中的循环?提前致以感谢。 -
最终,我正在尝试开始=>学校是否为他们提供的级别制定计划或不制定计划,以及学校类型如何影响此事?(以后可以建模,但现在我只需要百分比)
-
数据:
dput(quest40_2)
structure(list(SCHOOL = structure(c("School1", "School1", "School1",
"School1", "School1", "School1", "School1", "School2", "School2",
"School2", "School2", "School2", "School2", "School2", "School3",
"School3", "School3", "School3", "School3", "School3", "School3",
"School3", "School3", "School4", "School4", "School4", "School4",
"School4", "School4", "School5", "School5", "School5", "School5",
"School5", "School5", "School6", "School6", "School6", "School6",
"School6", "School7", "School7", "School7", "School7", "School7",
"School7", "School8", "School8", "School8", "School8", "School9",
"School9", "School9", "School9", "School9", "School10", "School10",
"School10", "School10", "School10", "School11", "School11", "School11",
"School11", "School11", "School11", "School11", "School11", "School11",
"School12", "School12", "School12", "School12", "School12", "School12",
"School12", "School13", "School13", "School13", "School13", "School13",
"School13", "School13", "School13", "School13", "School13", "School14",
"School14", "School14", "School15", "School15", "School15", "School15",
"School15", "School15", "School16", "School16", "School16", "School16",
"School16", "School16", "School16", "School17", "School17", "School17",
"School17", "School17", "School18", "School18", "School18", "School18",
"School18", "School19", "School19", "School19", "School19", "School19",
"School19", "School20", "School20", "School20", "School21", "School
<details>
<summary>英文:</summary>
I want to calculate the proportion of ```plan``` with respect to ```school type``` in the dataset below. Thing is, I have to first subset the dataset a couple of times. In this case, for example, I need to subset the dataset so that I only get schools which offer from ```level 0``` up to ```level 04```. **How can I do that?**
* edit:
The way below *is working*, but I'm not filtering all schools that have all levels from ```level 0``` up to ```level 04``` , I'm getting the schools that have either of them. Ideas to that would be much appreciated.
* data description:
SCHOOL = name of the school
Q9 = type of school
Q11 = levels that the schools offer
Q40 = types of planning that the school offers related to each level.
Note: All plannings are related to a school level. Hence, If I don't filter by the levels that each school offers and just calculate this columns' prop, my results will be misleading because each school can offer different level options (all of them, 1, 2 of them, etc)
my attempt:
D) filter schools which only offer UP TO Level 04
##################################### Quest40_2 ##########################
check vector's names:
unique(quest40_2$Q11)
unwanted_4_2 <- quest40_2 %>%
filter(Q11 %in% c('level05','level06')) ### FILTER UNWANTED SCHOOL LEVELS
create a vector with UNWANTED SCHOOL NAMES:
filter_vec_4_2 <- unique(unwanted_4_2$SCHOOL) ### get unique names (only school has 1 name)
assign the original dataframe to a dummy data frame:
filtered_df_4_2 <- quest40_2
loop over unwanted schools' names to remove them:
for (i in 1 :length(filter_vec_4_2)) {
filtered_df_4_2 <- filtered_df_4_2[!filtered_df_4_2$SCHOOL == filter_vec_4_2[i],]
}
* I need to count how many times EACH 'plan' occurs by each type of 'school' (and get its proportion). Problem = each school can have more than one 'plan' type. Since each school can have more than one ```plan type```, if we want to count the proportion of
```plan``` types according to each ```school type```, we cannot divide by n
b <- filtered_df_4_2 %>% drop_na(Q40) %>%
count(Q40, Q9) %>%
group_by(Q9)
* Which leaves me to:
counting the unique schools by each type of 'curriculo':
b2 <- b2 %>%
select(SCHOOL, Q9) %>%
unique() %>% count(Q9)
join dataframes and get the proportion of schools that have each type of 'planejamento'
within the curriculum types
c <- b %>% full_join(b2, by = 'Q9') %>%
mutate(prop = round((n.x/n.y *100), digits = 2)) %>%
select(-n.x, -n.y)
* *Q1* = I don't think I'm filtering it correctely since I'm not exclusively filtering the schools that offer alll levels up to 01 to 04. I guess I'm doing an 'or' not an 'and'
* *Q2* = Is there a way to avoid the loop with ```tidyverse``` ? Thanks in adv.
* Ultimately, I'm trying to get the starts to => **'do the schools plan their agenda for the levels they offer or not and how does the type of school impact this?'** (later this can be modeled, but now I just need percentages)
* data:
dput(quest40_2)
structure(list(SCHOOL = structure(c("School1", "School1", "School1",
"School1", "School1", "School1", "School1", "School2", "School2",
"School2", "School2", "School2", "School2", "School2", "School3",
"School3", "School3", "School3", "School3", "School3", "School3",
"School3", "School3", "School4", "School4", "School4", "School4",
"School4", "School4", "School5", "School5", "School5", "School5",
"School5", "School5", "School6", "School6", "School6", "School6",
"School6", "School7", "School7", "School7", "School7", "School7",
"School7", "School8", "School8", "School8", "School8", "School9",
"School9", "School9", "School9", "School9", "School10", "School10",
"School10", "School10", "School10", "School11", "School11", "School11",
"School11", "School11", "School11", "School11", "School11", "School11",
"School12", "School12", "School12", "School12", "School12", "School12",
"School12", "School13", "School13", "School13", "School13", "School13",
"School13", "School13", "School13", "School13", "School13", "School14",
"School14", "School14", "School15", "School15", "School15", "School15",
"School15", "School15", "School16", "School16", "School16", "School16",
"School16", "School16", "School16", "School17", "School17", "School17",
"School17", "School17", "School18", "School18", "School18", "School18",
"School18", "School19", "School19", "School19", "School19", "School19",
"School19", "School20", "School20", "School20", "School21", "School21",
"School21", "School21", "School21", "School22", "School22", "School22",
"School22", "School22", "School23", "School23", "School23", "School23",
"School23", "School23", "School24", "School24", "School24", "School24",
"School24", "School24", "School25", "School25", "School25", "School25",
"School25", "School26", "School26", "School26", "School26", "School26",
"School26", "School26", "School26", "School26", "School27", "School27",
"School27", "School27", "School27", "School27", "School27", "School28",
"School28", "School28", "School28", "School28", "School28", "School28",
"School28", "School29", "School29", "School29", "School29", "School29",
"School29", "School30", "School30", "School30", "School30", "School30",
"School30", "School30", "School30", "School30", "School30", "School30",
"School31", "School31", "School31", "School31", "School31", "School31",
"School31", "School31", "School31", "School31", "School31", "School32",
"School32", "School32", "School32", "School32", "School32", "School32",
"School32", "School32", "School32", "School32", "School33", "School33",
"School33", "School33", "School33", "School33", "School33", "School34",
"School34", "School34", "School34", "School34", "School34", "School34",
"School34", "School35", "School35", "School35", "School35", "School36",
"School36", "School36", "School36", "School36", "School37", "School37",
"School37", "School37", "School37", "School37", "School37", "School37",
"School38", "School38", "School38", "School38", "School39", "School39",
"School39", "School39", "School39", "School39"), class = c("glue",
"character")), Q9 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
3L, 3L, 3L, 3L, 3L), .Label = c("typeA", "typeB", "typeC", "typeD"
), class = "factor"), Q11 = structure(c(3L, 7L, 4L, 2L, NA, NA,
NA, 7L, 4L, 2L, 1L, NA, NA, NA, 7L, 4L, 2L, 1L, 5L, NA, NA, NA,
NA, 4L, 2L, 1L, NA, NA, NA, 7L, 4L, 2L, 1L, 5L, NA, 7L, 4L, 2L,
1L, NA, 4L, 2L, 1L, 5L, 6L, NA, 4L, 2L, 1L, NA, 4L, 2L, 1L, 5L,
NA, 4L, 2L, 1L, 5L, NA, 4L, 2L, 1L, 5L, NA, NA, NA, NA, NA, 7L,
4L, 2L, 1L, NA, NA, NA, 7L, 4L, 2L, 1L, 5L, 6L, NA, NA, NA, NA,
5L, 6L, NA, 4L, 2L, 1L, 5L, 6L, NA, 4L, 2L, 1L, 5L, NA, NA, NA,
7L, 4L, 2L, 1L, NA, 7L, 4L, 2L, 1L, NA, 4L, 2L, 1L, NA, NA, NA,
2L, 1L, NA, 4L, 2L, 1L, 5L, NA, 4L, 2L, 1L, 5L, NA, 7L, 4L, 2L,
1L, NA, NA, 4L, 2L, 1L, NA, NA, NA, 4L, 2L, 1L, 5L, NA, 7L, 4L,
2L, 1L, 5L, NA, NA, NA, NA, 2L, 1L, 5L, NA, NA, NA, NA, 3L, 7L,
4L, 2L, 1L, 5L, 6L, NA, 4L, 2L, 1L, 5L, NA, NA, 7L, 4L, 2L, 1L,
5L, 6L, NA, NA, NA, NA, NA, 7L, 4L, 2L, 1L, 5L, 6L, NA, NA, NA,
NA, NA, 7L, 4L, 2L, 1L, 5L, 6L, NA, NA, NA, NA, NA, 7L, 4L, 2L,
1L, 5L, 6L, NA, 7L, 4L, 2L, 1L, 5L, NA, NA, NA, 3L, 7L, 4L, NA,
7L, 4L, 2L, NA, NA, 4L, 2L, 1L, 5L, NA, NA, NA, NA, 4L, 2L, 1L,
NA, 7L, 4L, 2L, 1L, 5L, NA), .Label = c("level04", "level03",
"level0", "level02", "level05", "level06", "level01"), class = "factor"),
Q40 = structure(c(NA, NA, NA, NA, 2L, 6L, 5L, NA, NA, NA,
NA, 2L, 6L, 5L, NA, NA, NA, NA, NA, 2L, 6L, 5L, 4L, NA, NA,
NA, 2L, 6L, 5L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 1L,
NA, NA, NA, NA, NA, 1L, NA, NA, NA, 1L, NA, NA, NA, NA, 1L,
NA, NA, NA, NA, 1L, NA, NA, NA, NA, 2L, 6L, 5L, 4L, 1L, NA,
NA, NA, NA, 2L, 6L, 5L, NA, NA, NA, NA, NA, NA, 2L, 6L, 4L,
3L, NA, NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 2L,
6L, 5L, NA, NA, NA, NA, 2L, NA, NA, NA, NA, 1L, NA, NA, NA,
2L, 6L, 5L, NA, NA, 1L, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
1L, NA, NA, NA, NA, 2L, 1L, NA, NA, NA, 2L, 6L, 5L, NA, NA,
NA, NA, 1L, NA, NA, NA, NA, NA, 2L, 6L, 5L, 4L, NA, NA, NA,
2L, 6L, 5L, 4L, NA, NA, NA, NA, NA, NA, NA, 2L, NA, NA, NA,
NA, 2L, 1L, NA, NA, NA, NA, NA, NA, 2L, 6L, 5L, 4L, 3L, NA,
NA, NA, NA, NA, NA, 2L, 6L, 5L, 4L, 3L, NA, NA, NA, NA, NA,
NA, 2L, 6L, 5L, 4L, 3L, NA, NA, NA, NA, NA, NA, 1L, NA, NA,
NA, NA, NA, 2L, 6L, 5L, NA, NA, NA, 2L, NA, NA, NA, 2L, 6L,
NA, NA, NA, NA, 2L, 6L, 5L, 4L, NA, NA, NA, 1L, NA, NA, NA,
NA, NA, 1L), .Label = c("none", "plan1_level0upto02", "plan5_level_05",
"plan4_level_05", "plan3_level_04", "plan2_level_03"), class = "factor")), row.names = c(NA,
-253L), class = c("tbl_df", "tbl", "data.frame"))
</details>
# 答案1
**得分**: 1
以下是您需要的内容的中文翻译:
这可能是您需要的或为您提供一个良好的起点。它显示了每种类型的计划在级别0-4之间的数量,还显示了百分比。
```R
left_join(na.omit(quest40_2[, c("SCHOOL", "Q9", "Q11")]),
na.omit(quest40_2[, c("SCHOOL", "Q40")]), c("SCHOOL"),
multiple = "all") %>%
group_by(Q9, Q40) %>%
filter(sub(".*(\\d+)$", "\", Q11) <= 4) %>%
summarize(n = n(), .groups = "drop") %>%
mutate(percentage = n / sum(n) * 100) %>%
print(n = Inf)
# 一个数据表:23 × 4
Q9 Q40 n percentage
<fct> <fct> <int> <dbl>
1 typeA none 3 0.974
2 typeA plan1_level0upto02 6 1.95
3 typeA plan5_level_05 4 1.30
4 typeA plan4_level_05 6 1.95
5 typeA plan3_level_04 6 1.95
6 typeA plan2_level_03 6 1.95
7 typeB none 21 6.82
8 typeB plan1_level0upto02 47 15.3
9 typeB plan5_level_05 4 1.30
10 typeB plan4_level_05 14 4.55
11 typeB plan3_level_04 31 10.1
12 typeB plan2_level_03 31 10.1
13 typeC none 32 10.4
14 typeC plan1_level0upto02 21 6.82
15 typeC plan4_level_05 4 1.30
16 typeC plan3_level_04 15 4.87
17 typeC plan2_level_03 18 5.84
18 typeD none 3 0.974
19 typeD plan1_level0upto02 8 2.60
20 typeD plan5_level_05 8 2.60
21 typeD plan4_level_05 8 2.60
22 typeD plan3_level_04 4 1.30
23 typeD plan2_level_03 8 2.60
请注意,代码部分已被保留为原文。
英文:
This might be what you need or give you a good start. It shows the number of plans per type with levels 0-4, also shows the percentages.
left_join(na.omit(quest40_2[, c("SCHOOL", "Q9", "Q11")]),
na.omit(quest40_2[, c("SCHOOL", "Q40")]), c("SCHOOL"),
multiple = "all") %>%
group_by(Q9, Q40) %>%
filter(sub(".*(\\d+)$", "\", Q11) <= 4) %>%
summarize(n = n(), .groups = "drop") %>%
mutate(percentage = n / sum(n) * 100) %>%
print(n = Inf)
# A tibble: 23 × 4
Q9 Q40 n percentage
<fct> <fct> <int> <dbl>
1 typeA none 3 0.974
2 typeA plan1_level0upto02 6 1.95
3 typeA plan5_level_05 4 1.30
4 typeA plan4_level_05 6 1.95
5 typeA plan3_level_04 6 1.95
6 typeA plan2_level_03 6 1.95
7 typeB none 21 6.82
8 typeB plan1_level0upto02 47 15.3
9 typeB plan5_level_05 4 1.30
10 typeB plan4_level_05 14 4.55
11 typeB plan3_level_04 31 10.1
12 typeB plan2_level_03 31 10.1
13 typeC none 32 10.4
14 typeC plan1_level0upto02 21 6.82
15 typeC plan4_level_05 4 1.30
16 typeC plan3_level_04 15 4.87
17 typeC plan2_level_03 18 5.84
18 typeD none 3 0.974
19 typeD plan1_level0upto02 8 2.60
20 typeD plan5_level_05 8 2.60
21 typeD plan4_level_05 8 2.60
22 typeD plan3_level_04 4 1.30
23 typeD plan2_level_03 8 2.60
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论