用R替换变量的值为最频繁的值或最高的值

huangapple go评论108阅读模式
英文:

Replace values of a variable with the most frequent value, or the highest value in R

问题

无法找到此问题的示例:

我想要将分类变量(V1)的值替换为每个ID的最频繁出现值,新变量为(V2)。如果V1中没有最频繁的值,我想用最高的值来替换。

这是我的数据示例:

  1. my_data <- data.frame(ID = c("2", "2", "2", "2" ,"4", "4", "4", "4"),
  2. V1 = c("2", "1", "2", "1", "3", "1", "4", "3"))

这是我希望它看起来的样子:

用R替换变量的值为最频繁的值或最高的值

非常感谢任何帮助!

英文:

I can't seem to find an example of this problem:

I would like to replace the values of a categorical variable (V1) with the most frequently occuring value per ID for new variable (V2). If there is no most frequent value in V1, I would like to replace with the highest value.

Here is an example of my data:

  1. my_data <- data.frame(ID = c("2", "2", "2", "2" ,"4", "4", "4", "4"),
  2. V1 = c("2", "1", "2", "1", "3", "1", "4", "3"))

用R替换变量的值为最频繁的值或最高的值

This is what I would like it to look like:

用R替换变量的值为最频繁的值或最高的值

Any help hugely appreciated!!

答案1

得分: 2

Output

ID V1 V2
1 2 2 2
2 2 1 2
3 2 2 2
4 2 1 2
5 4 3 3
6 4 1 3
7 4 4 3
8 4 3 3
9 5 6 7
10 5 7 7

英文:

One approach is to sort or arrange by the frequencies of each value n and the V1 value secondarily. Then, for each ID, take the highest value of n first, followed by V1, and join back to original data (I modified data with two extra rows for example where there is no highest value for a given ID).

  1. library(dplyr) # v1.1.0
  2. my_data %>%
  3. count(ID, V1) %>%
  4. arrange(ID, desc(n), desc(V1)) %>%
  5. slice(1, .by = ID) %>%
  6. rename(V2 = V1) %>%
  7. right_join(my_data, multiple = "all") %>%
  8. select(ID, V1, V2)

Output

  1. ID V1 V2
  2. 1 2 2 2
  3. 2 2 1 2
  4. 3 2 2 2
  5. 4 2 1 2
  6. 5 4 3 3
  7. 6 4 1 3
  8. 7 4 4 3
  9. 8 4 3 3
  10. 9 5 6 7
  11. 10 5 7 7

答案2

得分: 1

更新(添加了max()并移除了1):

我们可以这样做:使用add_count

  1. library(dplyr)
  2. df %>%
  3. group_by(ID) %>%
  4. add_count(V1) %>%
  5. mutate(V2 = max(V1[n == max(n)])) %>%
  6. ungroup() %>%
  7. select(-n)
  1. ID V1 V2
  2. <chr> <chr> <chr>
  3. 1 2 2 2
  4. 2 2 1 2
  5. 3 2 2 2
  6. 4 2 1 2
  7. 5 4 3 3
  8. 6 4 1 3
  9. 7 4 4 3
  10. 8 4 3 3
英文:

Update(added max() and removed 1):

We could do it this way: Using add_count:

  1. library(dplyr)
  2. df %&gt;%
  3. group_by(ID) %&gt;%
  4. add_count(V1) %&gt;%
  5. mutate(V2 = max(V1[n==max(n)])) %&gt;%
  6. ungroup() %&gt;%
  7. select(-n)
  1. ID V1 V2
  2. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  3. 1 2 2 2
  4. 2 2 1 2
  5. 3 2 2 2
  6. 4 2 1 2
  7. 5 4 3 3
  8. 6 4 1 3
  9. 7 4 4 3
  10. 8 4 3 3

huangapple
  • 本文由 发表于 2023年3月8日 18:43:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671997.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定