为什么数据框输出没有按学校代码分组?

huangapple go评论65阅读模式
英文:

Why isn't the data frame output grouped by school_code?

问题

以下是翻译好的内容:

我在输出按“school_code”列分组的表格中的汇总值时遇到了问题。我不确定出了什么问题?我只是想计算出现次数(n)和百分比(%),然后将其编译到一个数据框/表中。如果有更简单的方法,请告诉我,我欢迎任何帮助!供参考,我正在使用R。

这是我正在处理的名为“studyprogressraw”的数据:

学校 面试状态 成绩
1 1 2
10 1 1
15 1 2
1 1 1

我的代码:

interviews_completed <- study_progress_raw %>%
  filter(interview_status == 1) %>%
  group_by(school_code) %>%
  mutate(num_row = n()) %>%
  mutate(percent = (num_row / sum(num_row)) * 100) %>%
  summarise(
    Interviews_Completed = paste0(num_row, " (", percent, ")")
  )

我想要的结果:

学校代码 完成的面试数量
1 2 (50)
10 1 (100)
15 1 (100)

我得到的结果:

学校代码 完成的面试数量
1 2 (50)
1 2 (50)
10 1 (100)
15 1 (100)

有人可以告诉我我的代码有什么问题吗?我按照学校代码进行了分组,所以我认为它应该只按该列进行汇总。

英文:

I am having an issue with outputting summarized values in a table that is grouped by column "school_code". I am not sure what went wrong? I simply want to calculated number of occurrences (n) and percentage (%) and compile it in a df/table. Let me know if there is an easier way for me to do this any help is welcome! For reference I am using R.

Here is the data I am working with called "studyprogressraw":

school interview_status grade
1 1 2
10 1 1
15 1 2
1 1 1

My code:

interviews_completed &lt;- study_progress_raw %&gt;%
  filter(ma_interview_status == 1) %&gt;% 
  group_by(school_code) %&gt;%
  mutate(num_row = n()) %&gt;%
  mutate(percent = (num_row / sum(num_row)) * 100) %&gt;%
  summarise(
    Interviews_Completed = paste0(num_row, &quot; (&quot;, percent, &quot;)&quot;)
  )

What I want:

school_code Interviews_Completed
1 2 (50)
10 1 (100)
15 1 (100)

What I got:

school_code Interviews_Completed
1 2 (50)
1 2 (50)
10 1 (100)
15 1 (100)

Can someone tell me what's wrong with my code? I grouped by school_code so I assumed it would just aggregate by that column.

答案1

得分: 2

我们可以使用 count 函数。

library(dplyr)

study_progress_raw %>%
  filter(interview_status == 1) %>%
  count(school) %>%
  mutate(n = paste0(n, " (", 100/n, ")"))
 school       n
1      1  2 (50)
2     10 1 (100)
3     15 1 (100)
英文:

We could use count

library(dplyr)

study_progress_raw %&gt;% 
  filter(interview_status == 1) %&gt;% 
  count(school) %&gt;% 
  mutate(n = paste0(n, &quot; (&quot;, 100/n, &quot;)&quot;))
 school       n
1      1  2 (50)
2     10 1 (100)
3     15 1 (100)

答案2

得分: 1

你需要将所有内容放在summarise函数内。mutate会返回一个与数据框长度相同的新列,而summarise返回的数据框长度仅与分组数量相同。

interviews_completed <- study_progress_raw %>%
  filter(interview_status == 1) %>%
  group_by(school) %>%
  summarise(
    Interviews_Completed = paste0(n(), " (", (n()/sum(n())*100), ")")
  )

编辑: 如果你想保留你创建的额外列:

interviews_completed <- study_progress_raw %>%
  filter(interview_status == 1) %>%
  group_by(school) %>%
  summarise(
    num_row = n(),
    percent = (num_row / sum(num_row)) * 100,
    Interviews_Completed = paste0(num_row, " (", percent, ")")
  )
英文:

You need to put everything inside the summarise function. mutate will return a new column the length of the dataframe, whereas summarise returns a dataframe that is only as long as the number of groups.

interviews_completed &lt;- study_progress_raw %&gt;%
  filter(interview_status == 1) %&gt;%
  group_by(school) %&gt;%
  summarise(
    Interviews_Completed = paste0(n(), &quot; (&quot;, (n()/sum(n())*100), &quot;)&quot;)
  )

EDIT: If you do want to keep the extra columns you created:

interviews_completed &lt;- study_progress_raw %&gt;%
  filter(interview_status == 1) %&gt;%
  group_by(school) %&gt;%
  summarise(
    num_row = n(),
    percent = (num_row / sum(num_row)) * 100,
    Interviews_Completed = paste0(num_row, &quot; (&quot;, percent, &quot;)&quot;)
  )

答案3

得分: 1

这是关于summarise()在dplyr 1.0.0之后的行为,请查看这篇文章以获取更多信息。我相信在dplyr 1.1.0之后,使用summarise()会发出警告:

警告信息:
在dplyr 1.1.0中已弃用每个`summarise()`组返回多于或少于1行的情况。
ℹ 请改用`reframe()`ℹ 从`summarise()`切换到`reframe()`时,请记住`reframe()`始终返回未分组的数据框,相应地进行调整。
调用`lifecycle::last_lifecycle_warnings()`以查看生成此警告的位置。

你可以尝试这样做:

interviews_completed <- study_progress_raw %>%
  filter(ma_interview_status == 1) %>%
  group_by(school_code) %>%
  mutate(num_row = n()) %>%
  mutate(percent = (num_row / sum(num_row)) * 100) %>%
  summarise(
    Interviews_Completed = paste0(unique(num_row), " (", unique(percent), ")")
  )
英文:

This is the behavior of summarise() after dplyr 1.0.0, see this article for more information. And I believe after dplyr 1.1.0, this usage of summarise() will signal a warning:

Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. 

You could try this:

interviews_completed &lt;- study_progress_raw %&gt;%
  filter(ma_interview_status == 1) %&gt;% 
  group_by(school_code) %&gt;%
  mutate(num_row = n()) %&gt;%
  mutate(percent = (num_row / sum(num_row)) * 100) %&gt;%
  summarise(
    Interviews_Completed = paste0(unique(num_row), &quot; (&quot;, unique(percent), &quot;)&quot;)
  )

huangapple
  • 本文由 发表于 2023年7月11日 10:38:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76658403.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定