2023年6月1日 06:13:02go评论94阅读模式

英文:

Get mean values per sample, arranged by another column of ID's

问题

这是一个我想使用tidyverse解决的问题。很难解释，但示例将其清晰展示出来。

我有一个包含多个名称的数据集。每个名称由1个或多个ID表示（在实际数据中范围为1-5）。每个ID都有5个值，按抽取编号排序。

我想要获取每个名称nms的每个抽取的平均值。这意味着要获取每个个体id的每个抽取的平均val。一些名称（如A）只有一个ID，因此平均val或mean_val将相同，但由于B和C有多个ID，mean_val将对每个抽取进行平均。我希望结果看起来像这样：

# A tibble: 15 × 3
#   nms    draw   mean_val
#   <chr>  <int>     <dbl>
# 1 A         1        3  
# 2 A         2        4  
# 3 A         3        2  
# 4 A         4        4  
# 5 A         5        1  
# 6 B         1        1.5
# 7 B         2        1.5
# 8 B         3        2.5
# 9 B         4        2  
#10 B         5        2  
#11 C         1        1.5
#12 C         2        2.5
#13 C         3        2.5
#14 C         4        2.5
#15 C         5        4

希望这可以帮助你解决问题。

英文:

Here is a problem that I would like to solve using the tidyverse. It's difficult to explain but the example lays it out.

I have a dataset with several names. Each name is represented by 1 or more ID's (ranges from 1-5 in the real data). Each ID has 5 values, ordered by the draw number.

library(tidyverse)
set.seed(190)
dplyr::tibble(nms = c(rep(&quot;A&quot;, 5), rep(&quot;B&quot;,10), rep(&quot;C&quot;, 10)),
              draw = c(rep(1:5, times = 5)),
              id = c(rep(c(&quot;A1&quot;, &quot;B1&quot;, &quot;B2&quot;, &quot;C1&quot;, &quot;C2&quot;), each = 5)),
              val = c(sample(1:4, replace = T, size = 25))) %&gt;%  print(n=25)
#&gt; # A tibble: 25 &#215; 4
#&gt;    nms    draw id      val
#&gt;    &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;int&gt;
#&gt;  1 A         1 A1        3
#&gt;  2 A         2 A1        4
#&gt;  3 A         3 A1        2
#&gt;  4 A         4 A1        4
#&gt;  5 A         5 A1        1
#&gt;  6 B         1 B1        2
#&gt;  7 B         2 B1        1
#&gt;  8 B         3 B1        3
#&gt;  9 B         4 B1        3
#&gt; 10 B         5 B1        2
#&gt; 11 B         1 B2        1
#&gt; 12 B         2 B2        2
#&gt; 13 B         3 B2        2
#&gt; 14 B         4 B2        1
#&gt; 15 B         5 B2        2
#&gt; 16 C         1 C1        2
#&gt; 17 C         2 C1        1
#&gt; 18 C         3 C1        1
#&gt; 19 C         4 C1        3
#&gt; 20 C         5 C1        4
#&gt; 21 C         1 C2        1
#&gt; 22 C         2 C2        4
#&gt; 23 C         3 C2        4
#&gt; 24 C         4 C2        2
#&gt; 25 C         5 C2        4

<sup>Created on 2023-05-31 with reprex v2.0.2</sup>

I would like to get the average of each draw for each name nms. This means getting the average val of each draw for every individual id. Some names like A only have one id, so the average val, or mean_val, will be the same, but since B and C have multiple id's, the mean_val will be averaged for each draw. I would like the results to look like this

# A tibble: 15 &#215; 3
#   nms    draw   mean_val
#   &lt;chr&gt; &lt;int&gt; &lt;dbl&gt;
# 1 A         1   3  
# 2 A         2   4  
# 3 A         3   2  
# 4 A         4   4  
# 5 A         5   1  
# 6 B         1   1.5
# 7 B         2   1.5
# 8 B         3   2.5
# 9 B         4   2  
#10 B         5   2  
#11 C         1   1.5
#12 C         2   2.5
#13 C         3   2.5
#14 C         4   2.5
#15 C         5   4

答案1

得分: 0

我认为这将提供您想要的结果，或至少让您接近目标。

output <- df %>%
  group_by(nms, draw) %>%
  summarize(mean_val = mean(val))

# 一个 tibble: 15 × 3
# 组:   nms [3]
   nms    draw mean_val
   <chr> <int>    <dbl>
 1 A         1      3  
 2 A         2      4  
 3 A         3      2  
 4 A         4      4  
 5 A         5      1  
 6 B         1      1.5
 7 B         2      1.5
 8 B         3      2.5
 9 B         4      2  
10 B         5      2  
11 C         1      1.5
12 C         2      2.5
13 C         3      2.5
14 C         4      2.5
15 C         5      4

英文:

I think this will give you what you want or at least get you close.

output &lt;- df %&gt;% 
  group_by(nms, draw) %&gt;% 
  summarize(mean_val = mean(val))

# A tibble: 15 &#215; 3
# Groups:   nms [3]
   nms    draw mean_val
   &lt;chr&gt; &lt;int&gt;    &lt;dbl&gt;
 1 A         1      3  
 2 A         2      4  
 3 A         3      2  
 4 A         4      4  
 5 A         5      1  
 6 B         1      1.5
 7 B         2      1.5
 8 B         3      2.5
 9 B         4      2  
10 B         5      2  
11 C         1      1.5
12 C         2      2.5
13 C         3      2.5
14 C         4      2.5
15 C         5      4

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

获取每个样本的平均值，按另一列的ID进行排列

问题

答案1

在 mutate 中传递一个变量

在R中运行密集矩阵计算时，使用多个嵌套的for循环是否有不利之处？

如何在Excel工作表中将具有多个标题行的值矩阵进行”pivot_wider/melt”操作？

Shiny应用程序中的等效观察者，但行为不同

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。