2023年3月8日 18:43:43go评论108阅读模式

英文:

Replace values of a variable with the most frequent value, or the highest value in R

问题

无法找到此问题的示例：

我想要将分类变量（V1）的值替换为每个ID的最频繁出现值，新变量为（V2）。如果V1中没有最频繁的值，我想用最高的值来替换。

这是我的数据示例：

my_data &lt;- data.frame(ID = c(&quot;2&quot;, &quot;2&quot;, &quot;2&quot;, &quot;2&quot; ,&quot;4&quot;, &quot;4&quot;, &quot;4&quot;, &quot;4&quot;),
                      V1 = c(&quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;3&quot;, &quot;1&quot;, &quot;4&quot;, &quot;3&quot;))

这是我希望它看起来的样子：

非常感谢任何帮助！

英文:

I can't seem to find an example of this problem:

I would like to replace the values of a categorical variable (V1) with the most frequently occuring value per ID for new variable (V2). If there is no most frequent value in V1, I would like to replace with the highest value.

Here is an example of my data:

my_data &lt;- data.frame(ID = c(&quot;2&quot;, &quot;2&quot;, &quot;2&quot;, &quot;2&quot; ,&quot;4&quot;, &quot;4&quot;, &quot;4&quot;, &quot;4&quot;),
                      V1 = c(&quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;3&quot;, &quot;1&quot;, &quot;4&quot;, &quot;3&quot;))

This is what I would like it to look like:

Any help hugely appreciated!!

答案1

得分: 2

Output

ID V1 V2
1 2 2 2
2 2 1 2
3 2 2 2
4 2 1 2
5 4 3 3
6 4 1 3
7 4 4 3
8 4 3 3
9 5 6 7
10 5 7 7

英文:

One approach is to sort or arrange by the frequencies of each value n and the V1 value secondarily. Then, for each ID, take the highest value of n first, followed by V1, and join back to original data (I modified data with two extra rows for example where there is no highest value for a given ID).

library(dplyr) # v1.1.0
my_data %&gt;%
  count(ID, V1) %&gt;%
  arrange(ID, desc(n), desc(V1)) %&gt;%
  slice(1, .by = ID) %&gt;%
  rename(V2 = V1) %&gt;%
  right_join(my_data, multiple = &quot;all&quot;) %&gt;%
  select(ID, V1, V2)

Output

   ID V1 V2
1   2  2  2
2   2  1  2
3   2  2  2
4   2  1  2
5   4  3  3
6   4  1  3
7   4  4  3
8   4  3  3
9   5  6  7
10  5  7  7

答案2

得分: 1

更新（添加了max()并移除了1）：

我们可以这样做：使用add_count：

library(dplyr)
df %>%
  group_by(ID) %>%
  add_count(V1) %>%
  mutate(V2 = max(V1[n == max(n)])) %>%
  ungroup() %>%
  select(-n)

  ID    V1    V2   
  <chr> <chr> <chr>
1 2     2     2    
2 2     1     2    
3 2     2     2    
4 2     1     2    
5 4     3     3    
6 4     1     3    
7 4     4     3    
8 4     3     3

英文:

Update(added max() and removed 1):

We could do it this way: Using add_count:

library(dplyr)
df %&gt;% 
  group_by(ID) %&gt;% 
  add_count(V1) %&gt;% 
  mutate(V2 = max(V1[n==max(n)])) %&gt;% 
  ungroup() %&gt;% 
  select(-n)

  ID    V1    V2   
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
1 2     2     2    
2 2     1     2    
3 2     2     2    
4 2     1     2    
5 4     3     3    
6 4     1     3    
7 4     4     3    
8 4     3     3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用R替换变量的值为最频繁的值或最高的值

问题

答案1

答案2

quantmod的替代品，用于获取买入/卖出信息。

如何根据包含响应式图表的Rmarkdown生成HTML报告？

for循环使用case_when出现错误：“传递了4个参数给’for’，但需要3个”。

更新数据框中的名称以始终保持相同。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。