在R中基于另一张表格和两个条件创建新列。

huangapple go评论102阅读模式
英文:

Mutate new clumn based on another table with two conditions in R

问题

  1. # 我有两个数据框。我想根据条件将新列标记为“是”,如果 `value` 大于 df2_`point`。要标记新列,需要考虑 `team` 和 `gender` 的两个条件。以下是我的数据框:
  2. df1 <- data.frame(ID = LETTERS[1:8],
  3. gender = c("F", "M", "F", "F", "M", "M", "F", "M"),
  4. value = c(25.2,31.1,21.5,18.6,27.77,18.52,18.52,26.3),
  5. team = c("A","C","B","A","A","C","C","B"))
  6. df2 <- data.frame(team = c("A", "B", "C"),
  7. M_point = c(17.6, 20.3, 24.5),
  8. F_point = c(17.0, 18.2, 23.7))
  9. # 我想要得到如下结果:
  10. # ID gender value team grade
  11. # 1 A F 25.20 A 是 (性别是 F,团队是 A,需要根据 df2 中的团队 A 和 F_point 17.0)
  12. # 2 B M 31.10 C 是 (性别是 M,团队是 C,需要根据 df2 中的团队 C 和 M_point 24.5)
  13. # 3 C F 21.50 B 是 (性别是 F,团队是 B,需要根据 df2 中的团队 B 和 F_point 18.2)
  14. # 4 D F 18.60 A 是
  15. # 5 E M 27.77 A 是
  16. # 6 F M 18.52 C 否
  17. # 7 G F 18.52 C 否
  18. # 8 H M 26.30 B 是

要在R中实现这个,请使用以下代码。谢谢!

英文:

I have two dataframe. I want to mutate new column to label "Yes" if value > df2_point. To label new column, it needs to consider two conditions of team and gender. Here is my dateframe:

  1. df1 &lt;- data.frame(ID = LETTERS[1:8],
  2. gender = c(&quot;F&quot;, &quot;M&quot;, &quot;F&quot;, &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;F&quot;, &quot;M&quot;),
  3. value = c(25.2,31.1,21.5,18.6,27.77,18.52,18.52,26.3),
  4. team = c(&quot;A&quot;,&quot;C&quot;,&quot;B&quot;,&quot;A&quot;,&quot;A&quot;,&quot;C&quot;,&quot;C&quot;,&quot;B&quot;))
  5. df2 &lt;- data.frame(team = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;),
  6. M_point = c(17.6, 20.3, 24.5),
  7. F_point = c(17.0, 18.2, 23.7))

And I want to get like this :

  1. ID gender value team grade
  2. 1 A F 25.20 A Yes (geder is F and team is A, need to according to df2 team A and F_point 17.0)
  3. 2 B M 31.10 C Yes (geder is M and team is C, need to according to df2 team C and M_point 24.5)
  4. 3 C F 21.50 B Yes (geder is F and team is B, need to according to df2 team B and F_point 18.2)
  5. 4 D F 18.60 A Yes
  6. 5 E M 27.77 A Yes
  7. 6 F M 18.52 C No
  8. 7 G F 18.52 C No
  9. 8 H M 26.30 B Yes

How can I achieve this in R? Thanks.

答案1

得分: 1

你可以将df2堆叠成长格式,然后将其合并到df1中。

  1. library(dplyr)
  2. library(tidyr)
  3. df1 %>%
  4. left_join(
  5. pivot_longer(df2, -team, names_to = "gender", names_pattern = "(.)", values_to = "thres"),
  6. by = c("team", "gender")
  7. ) %>%
  8. mutate(grade = if_else(value > thres, "Yes", "No")) %>%
  9. select(-thres)
  10. # ID gender value team grade
  11. # 1 A F 25.20 A Yes
  12. # 2 B M 31.10 C Yes
  13. # 3 C F 21.50 B Yes
  14. # 4 D F 18.60 A Yes
  15. # 5 E M 27.77 A Yes
  16. # 6 F M 18.52 C No
  17. # 7 G F 18.52 C No
  18. # 8 H M 26.30 B Yes
英文:

You can stack df2 to the long format, and merge it into df1.

  1. library(dplyr)
  2. library(tidyr)
  3. df1 %&gt;%
  4. left_join(
  5. pivot_longer(df2, -team, names_to = &quot;gender&quot;, names_pattern = &quot;(.)&quot;, values_to = &quot;thres&quot;),
  6. by = join_by(team, gender)
  7. ) %&gt;%
  8. mutate(grade = if_else(value &gt; thres, &quot;Yes&quot;, &quot;No&quot;)) %&gt;%
  9. select(-thres)
  10. # ID gender value team grade
  11. # 1 A F 25.20 A Yes
  12. # 2 B M 31.10 C Yes
  13. # 3 C F 21.50 B Yes
  14. # 4 D F 18.60 A Yes
  15. # 5 E M 27.77 A Yes
  16. # 6 F M 18.52 C No
  17. # 7 G F 18.52 C No
  18. # 8 H M 26.30 B Yes

答案2

得分: 1

这是一个使用数据表格进行非等值连接的示例,使用了df2的融合(即长格式)版本。

  1. library(data.table)
  2. # 将 df 转换为 data.table 格式
  3. setDT(df1); setDT(df2)
  4. # 初始化 grade 列为 "no" 值
  5. df1[, grade := "no"]
  6. # 当必要时将值更新为 "yes"
  7. df1[melt(df2, id.vars = "team")[, gender := gsub("(.).*", "\", variable)],
  8. grade := "yes",
  9. on = .(team, gender, value > value)][]
  10. # ID gender value team grade
  11. # 1: A F 25.20 A yes
  12. # 2: B M 31.10 C yes
  13. # 3: C F 21.50 B yes
  14. # 4: D F 18.60 A yes
  15. # 5: E M 27.77 A yes
  16. # 6: F M 18.52 C no
  17. # 7: G F 18.52 C no
  18. # 8: H M 26.30 B yes

注意:这段代码是用R语言编写的,用于执行非等值连接操作,将df1的"grade"列更新为"yes"或"no",具体条件是根据"team"、"gender"和"value"的值进行判断。

英文:

Here is a data.table with a non-equi join by reference, using a molten (i.e. long) version of df2.

  1. library(data.table)
  2. # set df&#39;s to data.table format
  3. setDT(df1); setDT(df2)
  4. # initialise grade column with &quot;no&quot; value
  5. df1[, grade := &quot;no&quot;]
  6. # update value to yes when necessairy
  7. df1[melt(df2, id.vars = &quot;team&quot;)[, gender := gsub(&quot;(.).*&quot;, &quot;\&quot;, variable)],
  8. grade := &quot;yes&quot;,
  9. on = .(team, gender, value &gt; value)][]
  10. # ID gender value team grade
  11. # 1: A F 25.20 A yes
  12. # 2: B M 31.10 C yes
  13. # 3: C F 21.50 B yes
  14. # 4: D F 18.60 A yes
  15. # 5: E M 27.77 A yes
  16. # 6: F M 18.52 C no
  17. # 7: G F 18.52 C no
  18. # 8: H M 26.30 B yes

答案3

得分: 0

I prefer Darren Tsai's solution for its shortness. Here is another - more explicit way to do it:

  1. library(tidyverse)
  2. df3 <- df1 %>%
  3. left_join(df2, by = "team") %>%
  4. mutate(grade = ifelse(
  5. gender == "F" & team == "A" & value > F_point, "Yes",
  6. ifelse(
  7. gender == "M" & team == "C" & value > M_point, "Yes",
  8. ifelse(
  9. gender == "F" & team == "B" & value > F_point, "Yes",
  10. ifelse(
  11. value > ifelse(gender == "M", M_point, F_point), "Yes", "No"
  12. )
  13. )
  14. )
  15. ))
  16. print(df_3)
  17. ID gender value team grade
  18. A F 25.20 A Yes
  19. B M 31.10 C Yes
  20. C F 21.50 B Yes
  21. D F 18.60 A Yes
  22. E M 27.77 A Yes
  23. F M 18.52 C No
  24. G F 18.52 C No
  25. H M 26.30 B Yes
英文:

I prefer Darren Tsai's solution for it's shortness. Here is another - more explicit way to do it :

  1. `library(tidyverse)
  2. df3 &lt;- df1 %&gt;%
  3. left_join(df2, by = &quot;team&quot;) %&gt;%
  4. mutate(grade = ifelse(
  5. gender == &quot;F&quot; &amp; team == &quot;A&quot; &amp; value &gt; F_point, &quot;Yes&quot;,
  6. ifelse(
  7. gender == &quot;M&quot; &amp; team == &quot;C&quot; &amp; value &gt; M_point, &quot;Yes&quot;,
  8. ifelse(
  9. gender == &quot;F&quot; &amp; team == &quot;B&quot; &amp; value &gt; F_point, &quot;Yes&quot;,
  10. ifelse(
  11. value &gt; ifelse(gender == &quot;M&quot;, M_point, F_point), &quot;Yes&quot;, &quot;No&quot;
  12. )
  13. )
  14. )
  15. ))
  16. print(df_3)`
  17. ID gender value team grade
  18. A F 25.20 A Yes
  19. B M 31.10 C Yes
  20. C F 21.50 B Yes
  21. D F 18.60 A Yes
  22. E M 27.77 A Yes
  23. F M 18.52 C No
  24. G F 18.52 C No
  25. H M 26.30 B Yes

huangapple
  • 本文由 发表于 2023年7月17日 15:11:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76702188.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定