选择数据框按因子分组后的两个最大值。

huangapple go评论88阅读模式
英文:

R select the two max values in dataframe grouped by factor

问题

我有以下的数据框:

v=c(1, 2, 3)
df <- data.frame(V1 = randomNumbers(n = 18,min = 0,max = 1, col=1),
                 factor_col = c(rep("A", 18)),
                 sessions = rep(v, each=6))

v=c(1, 2, 3, 4, 5, 6, 7, 8)
df2 <- data.frame(V1 = randomNumbers(n = 24,min = 0,max = 1, col=1),
                  factor_col = c(rep("B", 24)),
                  sessions = rep(v, each=3))

v=c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12)
df3 <- data.frame(V1 = randomNumbers(n = 33,min = 0,max = 1, col=1),
                  factor_col = c(rep("C", 33)),
                  sessions = rep(v, each=3))

Table = bind_rows(df, df2)
Table = bind_rows(Table, df3)

如何筛选每个factor_col的两个最大值的sessions,并计算这两个会话的V1的平均值,对于每个factor_col呢?

英文:

I have the following dataframe

v=c(1, 2, 3)
df &lt;- data.frame(V1 = randomNumbers(n = 18,min = 0,max = 1, col=1),
                 factor_col = c(rep(&quot;A&quot;, 18)),
                 sessions = rep(v, each=6))

v=c(1, 2, 3, 4, 5, 6, 7, 8)
df2 &lt;- data.frame(V1 = randomNumbers(n = 24,min = 0,max = 1, col=1),
                  factor_col = c(rep(&quot;B&quot;, 24)),
                  sessions = rep(v, each=3))

v=c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12)
df3 &lt;- data.frame(V1 = randomNumbers(n = 33,min = 0,max = 1, col=1),
                  factor_col = c(rep(&quot;C&quot;, 33)),
                  sessions = rep(v, each=3))

Table = bind_rows(df, df2)
Table = bind_rows(Table, df3)

how do I filter for the two max values of sessions per each factor of factor_col and calculate the average of V1 across those lase two sessions, for each factor_col?

Thanks!

答案1

得分: 1

Table %>% distinct(factor_col, sessions) %>% group_by(factor_col) %>%
slice_max(n = 2, order_by = sessions) %>% left_join(Table) %>%
group_by(sessions, factor_col) %>% summarise(v1_mean = mean(V1))

sessions factor_col v1_mean

<dbl> <fct> <dbl>

1 2 A 0.5

2 3 A 0.333

3 7 B 0.667

4 8 B 0

5 11 C 0.667

6 12 C 0.667

英文:

IIUC:

Table %&gt;% distinct(factor_col, sessions)  %&gt;% group_by(factor_col) %&gt;% 
  slice_max(n = 2, order_by = sessions) %&gt;% left_join(Table) %&gt;%
  group_by(sessions, factor_col) %&gt;% summarise(v1_mean = mean(V1))


# sessions factor_col v1_mean
# &lt;dbl&gt; &lt;fct&gt;        &lt;dbl&gt;
# 1        2 A            0.5  
# 2        3 A            0.333
# 3        7 B            0.667
# 4        8 B            0    
# 5       11 C            0.667
# 6       12 C            0.667

huangapple
  • 本文由 发表于 2023年7月18日 00:26:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76706425.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定