2023年7月11日 10:38:14go评论90阅读模式

英文:

Why isn't the data frame output grouped by school_code?

问题

以下是翻译好的内容：

我在输出按“school_code”列分组的表格中的汇总值时遇到了问题。我不确定出了什么问题？我只是想计算出现次数（n）和百分比（%），然后将其编译到一个数据框/表中。如果有更简单的方法，请告诉我，我欢迎任何帮助！供参考，我正在使用R。

这是我正在处理的名为“studyprogressraw”的数据：

学校	面试状态	成绩
1	1	2
10	1	1
15	1	2
1	1	1

我的代码：

interviews_completed <- study_progress_raw %>%
  filter(interview_status == 1) %>%
  group_by(school_code) %>%
  mutate(num_row = n()) %>%
  mutate(percent = (num_row / sum(num_row)) * 100) %>%
  summarise(
    Interviews_Completed = paste0(num_row, " (", percent, ")")
  )

我想要的结果：

学校代码	完成的面试数量
1	2 (50)
10	1 (100)
15	1 (100)

我得到的结果：

学校代码	完成的面试数量
1	2 (50)
1	2 (50)
10	1 (100)
15	1 (100)

有人可以告诉我我的代码有什么问题吗？我按照学校代码进行了分组，所以我认为它应该只按该列进行汇总。

英文:

I am having an issue with outputting summarized values in a table that is grouped by column "school_code". I am not sure what went wrong? I simply want to calculated number of occurrences (n) and percentage (%) and compile it in a df/table. Let me know if there is an easier way for me to do this any help is welcome! For reference I am using R.

Here is the data I am working with called "studyprogressraw":

school	interview_status	grade
1	1	2
10	1	1
15	1	2
1	1	1

My code:

interviews_completed &lt;- study_progress_raw %&gt;%
  filter(ma_interview_status == 1) %&gt;% 
  group_by(school_code) %&gt;%
  mutate(num_row = n()) %&gt;%
  mutate(percent = (num_row / sum(num_row)) * 100) %&gt;%
  summarise(
    Interviews_Completed = paste0(num_row, &quot; (&quot;, percent, &quot;)&quot;)
  )

What I want:

school_code	Interviews_Completed
1	2 (50)
10	1 (100)
15	1 (100)

What I got:

school_code	Interviews_Completed
1	2 (50)
1	2 (50)
10	1 (100)
15	1 (100)

Can someone tell me what's wrong with my code? I grouped by school_code so I assumed it would just aggregate by that column.

答案1

得分: 2

我们可以使用 count 函数。

library(dplyr)
study_progress_raw %>%
  filter(interview_status == 1) %>%
  count(school) %>%
  mutate(n = paste0(n, " (", 100/n, ")"))

 school       n
1      1  2 (50)
2     10 1 (100)
3     15 1 (100)

英文:

We could use count

library(dplyr)
study_progress_raw %&gt;% 
  filter(interview_status == 1) %&gt;% 
  count(school) %&gt;% 
  mutate(n = paste0(n, &quot; (&quot;, 100/n, &quot;)&quot;))

 school       n
1      1  2 (50)
2     10 1 (100)
3     15 1 (100)

答案2

得分: 1

你需要将所有内容放在summarise函数内。mutate会返回一个与数据框长度相同的新列，而summarise返回的数据框长度仅与分组数量相同。

interviews_completed <- study_progress_raw %>%
  filter(interview_status == 1) %>%
  group_by(school) %>%
  summarise(
    Interviews_Completed = paste0(n(), " (", (n()/sum(n())*100), ")")
  )

编辑： 如果你想保留你创建的额外列：

interviews_completed <- study_progress_raw %>%
  filter(interview_status == 1) %>%
  group_by(school) %>%
  summarise(
    num_row = n(),
    percent = (num_row / sum(num_row)) * 100,
    Interviews_Completed = paste0(num_row, " (", percent, ")")
  )

英文:

You need to put everything inside the summarise function. mutate will return a new column the length of the dataframe, whereas summarise returns a dataframe that is only as long as the number of groups.

interviews_completed &lt;- study_progress_raw %&gt;%
  filter(interview_status == 1) %&gt;%
  group_by(school) %&gt;%
  summarise(
    Interviews_Completed = paste0(n(), &quot; (&quot;, (n()/sum(n())*100), &quot;)&quot;)
  )

EDIT: If you do want to keep the extra columns you created:

interviews_completed &lt;- study_progress_raw %&gt;%
  filter(interview_status == 1) %&gt;%
  group_by(school) %&gt;%
  summarise(
    num_row = n(),
    percent = (num_row / sum(num_row)) * 100,
    Interviews_Completed = paste0(num_row, &quot; (&quot;, percent, &quot;)&quot;)
  )

答案3

得分: 1

这是关于summarise()在dplyr 1.0.0之后的行为，请查看这篇文章以获取更多信息。我相信在dplyr 1.1.0之后，使用summarise()会发出警告：

警告信息:
在dplyr 1.1.0中已弃用每个`summarise()`组返回多于或少于1行的情况。
ℹ 请改用`reframe()`。
ℹ 从`summarise()`切换到`reframe()`时，请记住`reframe()`始终返回未分组的数据框，相应地进行调整。
调用`lifecycle::last_lifecycle_warnings()`以查看生成此警告的位置。

你可以尝试这样做：

interviews_completed <- study_progress_raw %>%
  filter(ma_interview_status == 1) %>%
  group_by(school_code) %>%
  mutate(num_row = n()) %>%
  mutate(percent = (num_row / sum(num_row)) * 100) %>%
  summarise(
    Interviews_Completed = paste0(unique(num_row), " (", unique(percent), ")")
  )

英文:

This is the behavior of summarise() after dplyr 1.0.0, see this article for more information. And I believe after dplyr 1.1.0, this usage of summarise() will signal a warning:

Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

You could try this:

interviews_completed &lt;- study_progress_raw %&gt;%
  filter(ma_interview_status == 1) %&gt;% 
  group_by(school_code) %&gt;%
  mutate(num_row = n()) %&gt;%
  mutate(percent = (num_row / sum(num_row)) * 100) %&gt;%
  summarise(
    Interviews_Completed = paste0(unique(num_row), &quot; (&quot;, unique(percent), &quot;)&quot;)
  )

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么数据框输出没有按学校代码分组？

问题

答案1

答案2

答案3

形成一个对称矩阵，计算在同一群集中的实例数。

你可以在Dyplr的`rename_with()`函数的`.cols`参数中指定tibble的最后一列吗？

堆叠密度图上的标签

如何使用cowplot和ggplot排列多个图。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。