为什么数据框输出没有按学校代码分组?

huangapple go评论90阅读模式
英文:

Why isn't the data frame output grouped by school_code?

问题

以下是翻译好的内容:

我在输出按“school_code”列分组的表格中的汇总值时遇到了问题。我不确定出了什么问题?我只是想计算出现次数(n)和百分比(%),然后将其编译到一个数据框/表中。如果有更简单的方法,请告诉我,我欢迎任何帮助!供参考,我正在使用R。

这是我正在处理的名为“studyprogressraw”的数据:

学校 面试状态 成绩
1 1 2
10 1 1
15 1 2
1 1 1

我的代码:

  1. interviews_completed <- study_progress_raw %>%
  2. filter(interview_status == 1) %>%
  3. group_by(school_code) %>%
  4. mutate(num_row = n()) %>%
  5. mutate(percent = (num_row / sum(num_row)) * 100) %>%
  6. summarise(
  7. Interviews_Completed = paste0(num_row, " (", percent, ")")
  8. )

我想要的结果:

学校代码 完成的面试数量
1 2 (50)
10 1 (100)
15 1 (100)

我得到的结果:

学校代码 完成的面试数量
1 2 (50)
1 2 (50)
10 1 (100)
15 1 (100)

有人可以告诉我我的代码有什么问题吗?我按照学校代码进行了分组,所以我认为它应该只按该列进行汇总。

英文:

I am having an issue with outputting summarized values in a table that is grouped by column "school_code". I am not sure what went wrong? I simply want to calculated number of occurrences (n) and percentage (%) and compile it in a df/table. Let me know if there is an easier way for me to do this any help is welcome! For reference I am using R.

Here is the data I am working with called "studyprogressraw":

school interview_status grade
1 1 2
10 1 1
15 1 2
1 1 1

My code:

  1. interviews_completed &lt;- study_progress_raw %&gt;%
  2. filter(ma_interview_status == 1) %&gt;%
  3. group_by(school_code) %&gt;%
  4. mutate(num_row = n()) %&gt;%
  5. mutate(percent = (num_row / sum(num_row)) * 100) %&gt;%
  6. summarise(
  7. Interviews_Completed = paste0(num_row, &quot; (&quot;, percent, &quot;)&quot;)
  8. )

What I want:

school_code Interviews_Completed
1 2 (50)
10 1 (100)
15 1 (100)

What I got:

school_code Interviews_Completed
1 2 (50)
1 2 (50)
10 1 (100)
15 1 (100)

Can someone tell me what's wrong with my code? I grouped by school_code so I assumed it would just aggregate by that column.

答案1

得分: 2

我们可以使用 count 函数。

  1. library(dplyr)
  2. study_progress_raw %>%
  3. filter(interview_status == 1) %>%
  4. count(school) %>%
  5. mutate(n = paste0(n, " (", 100/n, ")"))
  1. school n
  2. 1 1 2 (50)
  3. 2 10 1 (100)
  4. 3 15 1 (100)
英文:

We could use count

  1. library(dplyr)
  2. study_progress_raw %&gt;%
  3. filter(interview_status == 1) %&gt;%
  4. count(school) %&gt;%
  5. mutate(n = paste0(n, &quot; (&quot;, 100/n, &quot;)&quot;))
  1. school n
  2. 1 1 2 (50)
  3. 2 10 1 (100)
  4. 3 15 1 (100)

答案2

得分: 1

你需要将所有内容放在summarise函数内。mutate会返回一个与数据框长度相同的新列,而summarise返回的数据框长度仅与分组数量相同。

  1. interviews_completed <- study_progress_raw %>%
  2. filter(interview_status == 1) %>%
  3. group_by(school) %>%
  4. summarise(
  5. Interviews_Completed = paste0(n(), " (", (n()/sum(n())*100), ")")
  6. )

编辑: 如果你想保留你创建的额外列:

  1. interviews_completed <- study_progress_raw %>%
  2. filter(interview_status == 1) %>%
  3. group_by(school) %>%
  4. summarise(
  5. num_row = n(),
  6. percent = (num_row / sum(num_row)) * 100,
  7. Interviews_Completed = paste0(num_row, " (", percent, ")")
  8. )
英文:

You need to put everything inside the summarise function. mutate will return a new column the length of the dataframe, whereas summarise returns a dataframe that is only as long as the number of groups.

  1. interviews_completed &lt;- study_progress_raw %&gt;%
  2. filter(interview_status == 1) %&gt;%
  3. group_by(school) %&gt;%
  4. summarise(
  5. Interviews_Completed = paste0(n(), &quot; (&quot;, (n()/sum(n())*100), &quot;)&quot;)
  6. )

EDIT: If you do want to keep the extra columns you created:

  1. interviews_completed &lt;- study_progress_raw %&gt;%
  2. filter(interview_status == 1) %&gt;%
  3. group_by(school) %&gt;%
  4. summarise(
  5. num_row = n(),
  6. percent = (num_row / sum(num_row)) * 100,
  7. Interviews_Completed = paste0(num_row, &quot; (&quot;, percent, &quot;)&quot;)
  8. )

答案3

得分: 1

这是关于summarise()在dplyr 1.0.0之后的行为,请查看这篇文章以获取更多信息。我相信在dplyr 1.1.0之后,使用summarise()会发出警告:

  1. 警告信息:
  2. dplyr 1.1.0中已弃用每个`summarise()`组返回多于或少于1行的情况。
  3. 请改用`reframe()`
  4. `summarise()`切换到`reframe()`时,请记住`reframe()`始终返回未分组的数据框,相应地进行调整。
  5. 调用`lifecycle::last_lifecycle_warnings()`以查看生成此警告的位置。

你可以尝试这样做:

  1. interviews_completed <- study_progress_raw %>%
  2. filter(ma_interview_status == 1) %>%
  3. group_by(school_code) %>%
  4. mutate(num_row = n()) %>%
  5. mutate(percent = (num_row / sum(num_row)) * 100) %>%
  6. summarise(
  7. Interviews_Completed = paste0(unique(num_row), " (", unique(percent), ")")
  8. )
英文:

This is the behavior of summarise() after dplyr 1.0.0, see this article for more information. And I believe after dplyr 1.1.0, this usage of summarise() will signal a warning:

  1. Warning message:
  2. Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
  3. Please use `reframe()` instead.
  4. When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust accordingly.
  5. Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

You could try this:

  1. interviews_completed &lt;- study_progress_raw %&gt;%
  2. filter(ma_interview_status == 1) %&gt;%
  3. group_by(school_code) %&gt;%
  4. mutate(num_row = n()) %&gt;%
  5. mutate(percent = (num_row / sum(num_row)) * 100) %&gt;%
  6. summarise(
  7. Interviews_Completed = paste0(unique(num_row), &quot; (&quot;, unique(percent), &quot;)&quot;)
  8. )

huangapple
  • 本文由 发表于 2023年7月11日 10:38:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76658403.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定