英文:
Why isn't the data frame output grouped by school_code?
问题
以下是翻译好的内容:
我在输出按“school_code”列分组的表格中的汇总值时遇到了问题。我不确定出了什么问题?我只是想计算出现次数(n)和百分比(%),然后将其编译到一个数据框/表中。如果有更简单的方法,请告诉我,我欢迎任何帮助!供参考,我正在使用R。
这是我正在处理的名为“studyprogressraw”的数据:
学校 | 面试状态 | 成绩 |
---|---|---|
1 | 1 | 2 |
10 | 1 | 1 |
15 | 1 | 2 |
1 | 1 | 1 |
我的代码:
interviews_completed <- study_progress_raw %>%
filter(interview_status == 1) %>%
group_by(school_code) %>%
mutate(num_row = n()) %>%
mutate(percent = (num_row / sum(num_row)) * 100) %>%
summarise(
Interviews_Completed = paste0(num_row, " (", percent, ")")
)
我想要的结果:
学校代码 | 完成的面试数量 |
---|---|
1 | 2 (50) |
10 | 1 (100) |
15 | 1 (100) |
我得到的结果:
学校代码 | 完成的面试数量 |
---|---|
1 | 2 (50) |
1 | 2 (50) |
10 | 1 (100) |
15 | 1 (100) |
有人可以告诉我我的代码有什么问题吗?我按照学校代码进行了分组,所以我认为它应该只按该列进行汇总。
英文:
I am having an issue with outputting summarized values in a table that is grouped by column "school_code". I am not sure what went wrong? I simply want to calculated number of occurrences (n) and percentage (%) and compile it in a df/table. Let me know if there is an easier way for me to do this any help is welcome! For reference I am using R.
Here is the data I am working with called "studyprogressraw":
school | interview_status | grade |
---|---|---|
1 | 1 | 2 |
10 | 1 | 1 |
15 | 1 | 2 |
1 | 1 | 1 |
My code:
interviews_completed <- study_progress_raw %>%
filter(ma_interview_status == 1) %>%
group_by(school_code) %>%
mutate(num_row = n()) %>%
mutate(percent = (num_row / sum(num_row)) * 100) %>%
summarise(
Interviews_Completed = paste0(num_row, " (", percent, ")")
)
What I want:
school_code | Interviews_Completed |
---|---|
1 | 2 (50) |
10 | 1 (100) |
15 | 1 (100) |
What I got:
school_code | Interviews_Completed |
---|---|
1 | 2 (50) |
1 | 2 (50) |
10 | 1 (100) |
15 | 1 (100) |
Can someone tell me what's wrong with my code? I grouped by school_code so I assumed it would just aggregate by that column.
答案1
得分: 2
我们可以使用 count
函数。
library(dplyr)
study_progress_raw %>%
filter(interview_status == 1) %>%
count(school) %>%
mutate(n = paste0(n, " (", 100/n, ")"))
school n
1 1 2 (50)
2 10 1 (100)
3 15 1 (100)
英文:
We could use count
library(dplyr)
study_progress_raw %>%
filter(interview_status == 1) %>%
count(school) %>%
mutate(n = paste0(n, " (", 100/n, ")"))
school n
1 1 2 (50)
2 10 1 (100)
3 15 1 (100)
答案2
得分: 1
你需要将所有内容放在summarise
函数内。mutate
会返回一个与数据框长度相同的新列,而summarise
返回的数据框长度仅与分组数量相同。
interviews_completed <- study_progress_raw %>%
filter(interview_status == 1) %>%
group_by(school) %>%
summarise(
Interviews_Completed = paste0(n(), " (", (n()/sum(n())*100), ")")
)
编辑: 如果你想保留你创建的额外列:
interviews_completed <- study_progress_raw %>%
filter(interview_status == 1) %>%
group_by(school) %>%
summarise(
num_row = n(),
percent = (num_row / sum(num_row)) * 100,
Interviews_Completed = paste0(num_row, " (", percent, ")")
)
英文:
You need to put everything inside the summarise
function. mutate
will return a new column the length of the dataframe, whereas summarise
returns a dataframe that is only as long as the number of groups.
interviews_completed <- study_progress_raw %>%
filter(interview_status == 1) %>%
group_by(school) %>%
summarise(
Interviews_Completed = paste0(n(), " (", (n()/sum(n())*100), ")")
)
EDIT: If you do want to keep the extra columns you created:
interviews_completed <- study_progress_raw %>%
filter(interview_status == 1) %>%
group_by(school) %>%
summarise(
num_row = n(),
percent = (num_row / sum(num_row)) * 100,
Interviews_Completed = paste0(num_row, " (", percent, ")")
)
答案3
得分: 1
这是关于summarise()
在dplyr 1.0.0之后的行为,请查看这篇文章以获取更多信息。我相信在dplyr 1.1.0之后,使用summarise()
会发出警告:
警告信息:
在dplyr 1.1.0中已弃用每个`summarise()`组返回多于或少于1行的情况。
ℹ 请改用`reframe()`。
ℹ 从`summarise()`切换到`reframe()`时,请记住`reframe()`始终返回未分组的数据框,相应地进行调整。
调用`lifecycle::last_lifecycle_warnings()`以查看生成此警告的位置。
你可以尝试这样做:
interviews_completed <- study_progress_raw %>%
filter(ma_interview_status == 1) %>%
group_by(school_code) %>%
mutate(num_row = n()) %>%
mutate(percent = (num_row / sum(num_row)) * 100) %>%
summarise(
Interviews_Completed = paste0(unique(num_row), " (", unique(percent), ")")
)
英文:
This is the behavior of summarise()
after dplyr 1.0.0, see this article for more information. And I believe after dplyr 1.1.0, this usage of summarise()
will signal a warning:
Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
You could try this:
interviews_completed <- study_progress_raw %>%
filter(ma_interview_status == 1) %>%
group_by(school_code) %>%
mutate(num_row = n()) %>%
mutate(percent = (num_row / sum(num_row)) * 100) %>%
summarise(
Interviews_Completed = paste0(unique(num_row), " (", unique(percent), ")")
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论