用R替换变量的值为最频繁的值或最高的值

huangapple go评论84阅读模式
英文:

Replace values of a variable with the most frequent value, or the highest value in R

问题

无法找到此问题的示例:

我想要将分类变量(V1)的值替换为每个ID的最频繁出现值,新变量为(V2)。如果V1中没有最频繁的值,我想用最高的值来替换。

这是我的数据示例:

my_data <- data.frame(ID = c("2", "2", "2", "2" ,"4", "4", "4", "4"),
                      V1 = c("2", "1", "2", "1", "3", "1", "4", "3"))

这是我希望它看起来的样子:

用R替换变量的值为最频繁的值或最高的值

非常感谢任何帮助!

英文:

I can't seem to find an example of this problem:

I would like to replace the values of a categorical variable (V1) with the most frequently occuring value per ID for new variable (V2). If there is no most frequent value in V1, I would like to replace with the highest value.

Here is an example of my data:

my_data <- data.frame(ID = c("2", "2", "2", "2" ,"4", "4", "4", "4"),
                      V1 = c("2", "1", "2", "1", "3", "1", "4", "3"))

用R替换变量的值为最频繁的值或最高的值

This is what I would like it to look like:

用R替换变量的值为最频繁的值或最高的值

Any help hugely appreciated!!

答案1

得分: 2

Output

ID V1 V2
1 2 2 2
2 2 1 2
3 2 2 2
4 2 1 2
5 4 3 3
6 4 1 3
7 4 4 3
8 4 3 3
9 5 6 7
10 5 7 7

英文:

One approach is to sort or arrange by the frequencies of each value n and the V1 value secondarily. Then, for each ID, take the highest value of n first, followed by V1, and join back to original data (I modified data with two extra rows for example where there is no highest value for a given ID).

library(dplyr) # v1.1.0

my_data %>%
  count(ID, V1) %>%
  arrange(ID, desc(n), desc(V1)) %>%
  slice(1, .by = ID) %>%
  rename(V2 = V1) %>%
  right_join(my_data, multiple = "all") %>%
  select(ID, V1, V2)

Output

   ID V1 V2
1   2  2  2
2   2  1  2
3   2  2  2
4   2  1  2
5   4  3  3
6   4  1  3
7   4  4  3
8   4  3  3
9   5  6  7
10  5  7  7

答案2

得分: 1

更新(添加了max()并移除了1):

我们可以这样做:使用add_count

library(dplyr)

df %>%
  group_by(ID) %>%
  add_count(V1) %>%
  mutate(V2 = max(V1[n == max(n)])) %>%
  ungroup() %>%
  select(-n)
  ID    V1    V2   
  <chr> <chr> <chr>
1 2     2     2    
2 2     1     2    
3 2     2     2    
4 2     1     2    
5 4     3     3    
6 4     1     3    
7 4     4     3    
8 4     3     3 
英文:

Update(added max() and removed 1):

We could do it this way: Using add_count:

library(dplyr)

df %&gt;% 
  group_by(ID) %&gt;% 
  add_count(V1) %&gt;% 
  mutate(V2 = max(V1[n==max(n)])) %&gt;% 
  ungroup() %&gt;% 
  select(-n)
  ID    V1    V2   
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
1 2     2     2    
2 2     1     2    
3 2     2     2    
4 2     1     2    
5 4     3     3    
6 4     1     3    
7 4     4     3    
8 4     3     3 

huangapple
  • 本文由 发表于 2023年3月8日 18:43:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671997.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定