2023年3月10日 00:35:15go评论102阅读模式

英文:

Rolling mean per group in tidyverse

问题

以下是您要翻译的内容：

I aggregate data per group and calculate means per group to ease visualization. Unfortunately, some of my groups are very large, some are rather empty. I like to have a rolling mean calculation to smooth the picture further. Here is similar data:

load package

library(haven)

read dta file from github

soep <- read_dta("https://github.com/MarcoKuehne/marcokuehne.github.io/blob/main/data/SOEP/soep_lebensz_en/soep_lebensz_en.dta?raw=true")

soep %>%
group_by(education, sex) %>%
summarise(across(satisf_org, mean, na.rm = TRUE),
n = n()) %>%
ggplot(aes(x = education, y = satisf_org, col = as.factor(sex))) +
geom_point() +
labs(title = "Mean Satisfaction per Education Level by Gender",
x = "Education", y = "Mean Satisfaction", color = "Gender")

The mean satisfaction at education 8.5 for females looks like an outlier. At every year of education, I assume that people are not too different to be summarized, i.e. calculate the mean satisfaction of all people at education 7, 8.5 and 9 (grouped by sex) and store it as rolling mean at 8.5 (grouped by sex).

Starting from standard grouped means:

soep %>%
group_by(education, sex) %>%
summarise(across(satisf_org, mean, na.rm = TRUE),
n = n())

A tibble: 28 × 4

Groups: education [14]

education sex satisf_org n
<dbl> <dbl+lbl> <dbl> <int>
1 7 0 [male] 6.16 73
2 7 1 [female] 6.59 113
3 8.5 0 [male] 7.16 37
4 8.5 1 [female] 8.56 18
5 9 0 [male] 6.88 430
6 9 1 [female] 7.00 633
7 10 0 [male] 7.19 144
8 10 1 [female] 7.36 221
9 10.5 0 [male] 6.96 1538
10 10.5 1 [female] 7.02 1493

… with 18 more rows

ℹ Use `print(n = ...)` to see more rows

Here are the numbers that I expect

soep %>%
filter(sex == 1) %>% # only looks at females
filter(education %in% c(7, 8.5, 9)) %>% # take education level before and after
summarise(mean(satisf_org)) # calculate the "rolling mean" per group

A tibble: 1 × 1

mean(satisf_org)
<dbl>
1 6.97

This is the kind of rolling mean per group that I expect per value, i.e. 6.97 instead of 8.56.

PS: In my real data, I investigate age in years and I usually have at least some people at all ages. So the rolling window can be -1 to +1 (numeric) instead of lead / lag neighbors.

英文:

# load package
library(haven)
# read dta file from github
soep &lt;- read_dta(&quot;https://github.com/MarcoKuehne/marcokuehne.github.io/blob/main/data/SOEP/soep_lebensz_en/soep_lebensz_en.dta?raw=true&quot;)
soep %&gt;% 
  group_by(education, sex) %&gt;% 
  summarise(across(satisf_org, mean, na.rm = TRUE),
            n = n()) %&gt;% 
  ggplot(aes(x = education, y = satisf_org, col = as.factor(sex))) +
  geom_point() +
  labs(title = &quot;Mean Satisfaction per Education Level by Gender&quot;,
       x = &quot;Education&quot;, y = &quot;Mean Satisfaction&quot;, color = &quot;Gender&quot;)

Starting from standard grouped means:

soep %&gt;% 
  group_by(education, sex) %&gt;% 
  summarise(across(satisf_org, mean, na.rm = TRUE),
            n = n())
# A tibble: 28 &#215; 4
# Groups:   education [14]
   education sex        satisf_org     n
       &lt;dbl&gt; &lt;dbl+lbl&gt;       &lt;dbl&gt; &lt;int&gt;
 1       7   0 [male]         6.16    73
 2       7   1 [female]       6.59   113
 3       8.5 0 [male]         7.16    37
 4       8.5 1 [female]       8.56    18
 5       9   0 [male]         6.88   430
 6       9   1 [female]       7.00   633
 7      10   0 [male]         7.19   144
 8      10   1 [female]       7.36   221
 9      10.5 0 [male]         6.96  1538
10      10.5 1 [female]       7.02  1493
# … with 18 more rows
# ℹ Use `print(n = ...)` to see more rows

Here are the numbers that I expect

soep %&gt;% 
  filter(sex == 1) %&gt;%  # only looks at females
  filter(education %in% c(7, 8.5, 9)) %&gt;%  # take education level before and after
  summarise(mean(satisf_org)) # calculate the &quot;rolling mean&quot; per group 
# A tibble: 1 &#215; 1
  `mean(satisf_org)`
               &lt;dbl&gt;
1               6.97

This is the kind of rolling mean per group that I expect per value, i.e. 6.97 instead of 8.56.

PS: In my real data, I investigate age in years and I usually have at least some people at all ages. So the rolling window can be -1 to +1 (numeric) instead of lead / lag neighbours.

答案1

得分: 2

你可以按性别进行group_by操作，然后进行滚动平均计算：

library(dplyr)
library(slider)
soep %>%
  group_by(education, sex) %>%
  summarise(across(satisf_org, mean, na.rm = TRUE),
            n = n()) %>%
  group_by(sex) %>%
  mutate(rolling_mean = slide_dbl(satisf_org, mean, .before = 1, .after = 1))

输出：

# A tibble: 28 × 5
# Groups:   sex [2]
   education sex        satisf_org     n rolling_mean
       <dbl> <dbl+lbl>       <dbl> <int>        <dbl>
 1       7   0 [male]         6.16    73         6.66
 2       7   1 [female]       6.59   113         7.57
 3       8.5 0 [male]         7.16    37         6.73
 4       8.5 1 [female]       8.56    18         7.38
 5       9   0 [male]         6.88   430         7.08
 6       9   1 [female]       7.00   633         7.64
 7      10   0 [male]         7.19   144         7.01
 8      10   1 [female]       7.36   221         7.13
 9      10.5 0 [male]         6.96  1538         7.14
10      10.5 1 [female]       7.02  1493         7.20
# … with 18 more rows
# ℹ Use `print(n = ...)` to see more rows

注意：这只是代码的翻译部分，不包括问题中的其他内容。

英文:

You can group_by sex and do a rolling average there:

library(dplyr)
library(slider)
soep %&gt;% 
  group_by(education, sex) %&gt;% 
  summarise(across(satisf_org, mean, na.rm = TRUE),
            n = n()) %&gt;% 
  group_by(sex) %&gt;%
  mutate(rolling_mean = slide_dbl(satisf_org, mean, .before = 1, .after = 1))

output

# A tibble: 28 &#215; 5
# Groups:   sex [2]
   education sex        satisf_org     n rolling_mean
       &lt;dbl&gt; &lt;dbl+lbl&gt;       &lt;dbl&gt; &lt;int&gt;        &lt;dbl&gt;
 1       7   0 [male]         6.16    73         6.66
 2       7   1 [female]       6.59   113         7.57
 3       8.5 0 [male]         7.16    37         6.73
 4       8.5 1 [female]       8.56    18         7.38
 5       9   0 [male]         6.88   430         7.08
 6       9   1 [female]       7.00   633         7.64
 7      10   0 [male]         7.19   144         7.01
 8      10   1 [female]       7.36   221         7.13
 9      10.5 0 [male]         6.96  1538         7.14
10      10.5 1 [female]       7.02  1493         7.20
# … with 18 more rows
# ℹ Use `print(n = ...)` to see more rows

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在tidyverse中按组计算滚动均值。

问题

load package

read dta file from github

A tibble: 28 × 4

Groups: education [14]

… with 18 more rows

ℹ Use `print(n = ...)` to see more rows

A tibble: 1 × 1

答案1

合并两个表 tidyverse

包含带有指数的轴标题

Read an excel file with separate range of cells.

Mutate new columns and intercalate them with old ones.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

load package

read dta file from github

A tibble: 28 × 4

Groups: education [14]

… with 18 more rows

ℹ Use print(n = ...) to see more rows

A tibble: 1 × 1

答案1

发表评论

ℹ Use `print(n = ...)` to see more rows