2023年7月18日 00:26:11go评论88阅读模式

英文:

R select the two max values in dataframe grouped by factor

问题

我有以下的数据框：

v=c(1, 2, 3)
df <- data.frame(V1 = randomNumbers(n = 18,min = 0,max = 1, col=1),
                 factor_col = c(rep("A", 18)),
                 sessions = rep(v, each=6))

v=c(1, 2, 3, 4, 5, 6, 7, 8)
df2 <- data.frame(V1 = randomNumbers(n = 24,min = 0,max = 1, col=1),
                  factor_col = c(rep("B", 24)),
                  sessions = rep(v, each=3))

v=c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12)
df3 <- data.frame(V1 = randomNumbers(n = 33,min = 0,max = 1, col=1),
                  factor_col = c(rep("C", 33)),
                  sessions = rep(v, each=3))

Table = bind_rows(df, df2)
Table = bind_rows(Table, df3)

如何筛选每个factor_col的两个最大值的sessions，并计算这两个会话的V1的平均值，对于每个factor_col呢？

英文:

I have the following dataframe

v=c(1, 2, 3)
df &lt;- data.frame(V1 = randomNumbers(n = 18,min = 0,max = 1, col=1),
                 factor_col = c(rep(&quot;A&quot;, 18)),
                 sessions = rep(v, each=6))

v=c(1, 2, 3, 4, 5, 6, 7, 8)
df2 &lt;- data.frame(V1 = randomNumbers(n = 24,min = 0,max = 1, col=1),
                  factor_col = c(rep(&quot;B&quot;, 24)),
                  sessions = rep(v, each=3))

v=c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12)
df3 &lt;- data.frame(V1 = randomNumbers(n = 33,min = 0,max = 1, col=1),
                  factor_col = c(rep(&quot;C&quot;, 33)),
                  sessions = rep(v, each=3))

Table = bind_rows(df, df2)
Table = bind_rows(Table, df3)

how do I filter for the two max values of sessions per each factor of factor_col and calculate the average of V1 across those lase two sessions, for each factor_col?

Thanks!

答案1

得分: 1

Table %>% distinct(factor_col, sessions) %>% group_by(factor_col) %>%
slice_max(n = 2, order_by = sessions) %>% left_join(Table) %>%
group_by(sessions, factor_col) %>% summarise(v1_mean = mean(V1))

sessions factor_col v1_mean

<dbl> <fct> <dbl>

1 2 A 0.5

2 3 A 0.333

3 7 B 0.667

4 8 B 0

5 11 C 0.667

6 12 C 0.667

英文:

IIUC:

Table %&gt;% distinct(factor_col, sessions)  %&gt;% group_by(factor_col) %&gt;% 
  slice_max(n = 2, order_by = sessions) %&gt;% left_join(Table) %&gt;%
  group_by(sessions, factor_col) %&gt;% summarise(v1_mean = mean(V1))


# sessions factor_col v1_mean
# &lt;dbl&gt; &lt;fct&gt;        &lt;dbl&gt;
# 1        2 A            0.5  
# 2        3 A            0.333
# 3        7 B            0.667
# 4        8 B            0    
# 5       11 C            0.667
# 6       12 C            0.667

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

选择数据框按因子分组后的两个最大值。

问题

答案1

sessions factor_col v1_mean

<dbl> <fct> <dbl>

1 2 A 0.5

2 3 A 0.333

3 7 B 0.667

4 8 B 0

5 11 C 0.667

6 12 C 0.667

从矩阵的每一行中随机抽取一个元素。

从列表中提取对象的属性并将它们写入数据框中。

在R中从遵循特定关键字的字符串创建一个数据框。

将sf多边形转换为sp

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论