在R中基于另一张表格和两个条件创建新列。

huangapple go评论69阅读模式
英文:

Mutate new clumn based on another table with two conditions in R

问题

# 我有两个数据框。我想根据条件将新列标记为“是”,如果 `value` 大于 df2_`point`。要标记新列,需要考虑 `team` 和 `gender` 的两个条件。以下是我的数据框:

df1 <- data.frame(ID = LETTERS[1:8],
                 gender = c("F", "M", "F", "F", "M", "M", "F", "M"),
                 value = c(25.2,31.1,21.5,18.6,27.77,18.52,18.52,26.3),
                 team = c("A","C","B","A","A","C","C","B"))

df2 <- data.frame(team = c("A", "B", "C"), 
                  M_point = c(17.6, 20.3, 24.5), 
                  F_point = c(17.0, 18.2, 23.7))

# 我想要得到如下结果:

#   ID gender value team   grade
# 1  A      F 25.20    A  是 (性别是 F,团队是 A,需要根据 df2 中的团队 A 和 F_point 17.0)
# 2  B      M 31.10    C  是 (性别是 M,团队是 C,需要根据 df2 中的团队 C 和 M_point 24.5)
# 3  C      F 21.50    B  是 (性别是 F,团队是 B,需要根据 df2 中的团队 B 和 F_point 18.2)
# 4  D      F 18.60    A  是
# 5  E      M 27.77    A  是
# 6  F      M 18.52    C  否
# 7  G      F 18.52    C  否
# 8  H      M 26.30    B  是

要在R中实现这个,请使用以下代码。谢谢!

英文:

I have two dataframe. I want to mutate new column to label "Yes" if value > df2_point. To label new column, it needs to consider two conditions of team and gender. Here is my dateframe:

df1 &lt;- data.frame(ID = LETTERS[1:8],
                 gender = c(&quot;F&quot;, &quot;M&quot;, &quot;F&quot;, &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;F&quot;, &quot;M&quot;),
                 value = c(25.2,31.1,21.5,18.6,27.77,18.52,18.52,26.3),
                 team = c(&quot;A&quot;,&quot;C&quot;,&quot;B&quot;,&quot;A&quot;,&quot;A&quot;,&quot;C&quot;,&quot;C&quot;,&quot;B&quot;))

df2 &lt;- data.frame(team = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), 
                  M_point = c(17.6, 20.3, 24.5), 
                  F_point = c(17.0, 18.2, 23.7))

And I want to get like this :

 ID gender value team   grade
1  A      F 25.20    A  Yes (geder is F and team is A, need to according to df2 team A and F_point 17.0)
2  B      M 31.10    C  Yes (geder is M and team is C, need to according to df2 team C and M_point 24.5)
3  C      F 21.50    B  Yes (geder is F and team is B, need to according to df2 team B and F_point 18.2)
4  D      F 18.60    A  Yes
5  E      M 27.77    A  Yes
6  F      M 18.52    C  No
7  G      F 18.52    C  No
8  H      M 26.30    B  Yes

How can I achieve this in R? Thanks.

答案1

得分: 1

你可以将df2堆叠成长格式,然后将其合并到df1中。

library(dplyr)
library(tidyr)

df1 %>%
  left_join(
    pivot_longer(df2, -team, names_to = "gender", names_pattern = "(.)", values_to = "thres"),
    by = c("team", "gender")
  ) %>%
  mutate(grade = if_else(value > thres, "Yes", "No")) %>%
  select(-thres)

#   ID gender value team grade
# 1  A      F 25.20    A   Yes
# 2  B      M 31.10    C   Yes
# 3  C      F 21.50    B   Yes
# 4  D      F 18.60    A   Yes
# 5  E      M 27.77    A   Yes
# 6  F      M 18.52    C    No
# 7  G      F 18.52    C    No
# 8  H      M 26.30    B   Yes
英文:

You can stack df2 to the long format, and merge it into df1.

library(dplyr)
library(tidyr)

df1 %&gt;%
  left_join(
    pivot_longer(df2, -team, names_to = &quot;gender&quot;, names_pattern = &quot;(.)&quot;, values_to = &quot;thres&quot;),
    by = join_by(team, gender)
  ) %&gt;%
  mutate(grade = if_else(value &gt; thres, &quot;Yes&quot;, &quot;No&quot;)) %&gt;%
  select(-thres)

#   ID gender value team grade
# 1  A      F 25.20    A   Yes
# 2  B      M 31.10    C   Yes
# 3  C      F 21.50    B   Yes
# 4  D      F 18.60    A   Yes
# 5  E      M 27.77    A   Yes
# 6  F      M 18.52    C    No
# 7  G      F 18.52    C    No
# 8  H      M 26.30    B   Yes

答案2

得分: 1

这是一个使用数据表格进行非等值连接的示例,使用了df2的融合(即长格式)版本。

library(data.table)
# 将 df 转换为 data.table 格式
setDT(df1); setDT(df2)
# 初始化 grade 列为 "no" 值
df1[, grade := "no"]
# 当必要时将值更新为 "yes"
df1[melt(df2, id.vars = "team")[, gender := gsub("(.).*", "\", variable)],
    grade := "yes",
    on = .(team, gender, value > value)][]
#    ID gender value team grade
# 1:  A      F 25.20    A   yes
# 2:  B      M 31.10    C   yes
# 3:  C      F 21.50    B   yes
# 4:  D      F 18.60    A   yes
# 5:  E      M 27.77    A   yes
# 6:  F      M 18.52    C    no
# 7:  G      F 18.52    C    no
# 8:  H      M 26.30    B   yes

注意:这段代码是用R语言编写的,用于执行非等值连接操作,将df1的"grade"列更新为"yes"或"no",具体条件是根据"team"、"gender"和"value"的值进行判断。

英文:

Here is a data.table with a non-equi join by reference, using a molten (i.e. long) version of df2.

library(data.table)
# set df&#39;s to data.table format
setDT(df1); setDT(df2)
# initialise grade column with &quot;no&quot; value
df1[, grade := &quot;no&quot;]
# update value to yes when necessairy
df1[melt(df2, id.vars = &quot;team&quot;)[, gender := gsub(&quot;(.).*&quot;, &quot;\&quot;, variable)],
    grade := &quot;yes&quot;,
    on = .(team, gender, value &gt; value)][]
#    ID gender value team grade
# 1:  A      F 25.20    A   yes
# 2:  B      M 31.10    C   yes
# 3:  C      F 21.50    B   yes
# 4:  D      F 18.60    A   yes
# 5:  E      M 27.77    A   yes
# 6:  F      M 18.52    C    no
# 7:  G      F 18.52    C    no
# 8:  H      M 26.30    B   yes

答案3

得分: 0

I prefer Darren Tsai's solution for its shortness. Here is another - more explicit way to do it:

library(tidyverse)
df3 <- df1 %>%
    left_join(df2, by = "team") %>%
    mutate(grade = ifelse(
      gender == "F" & team == "A" & value > F_point, "Yes",
      ifelse(
        gender == "M" & team == "C" & value > M_point, "Yes",
        ifelse(
          gender == "F" & team == "B" & value > F_point, "Yes",
          ifelse(
            value > ifelse(gender == "M", M_point, F_point), "Yes", "No"
          )
        )
      )
    )) 

print(df_3)
ID gender value team grade
A      F 25.20    A   Yes
B      M 31.10    C   Yes
C      F 21.50    B   Yes
D      F 18.60    A   Yes
E      M 27.77    A   Yes
F      M 18.52    C    No
G      F 18.52    C    No
H      M 26.30    B   Yes
英文:

I prefer Darren Tsai's solution for it's shortness. Here is another - more explicit way to do it :

`library(tidyverse)
df3 &lt;- df1 %&gt;%
    left_join(df2, by = &quot;team&quot;) %&gt;%
    mutate(grade = ifelse(
      gender == &quot;F&quot; &amp; team == &quot;A&quot; &amp; value &gt; F_point, &quot;Yes&quot;,
      ifelse(
        gender == &quot;M&quot; &amp; team == &quot;C&quot; &amp; value &gt; M_point, &quot;Yes&quot;,
        ifelse(
          gender == &quot;F&quot; &amp; team == &quot;B&quot; &amp; value &gt; F_point, &quot;Yes&quot;,
          ifelse(
            value &gt; ifelse(gender == &quot;M&quot;, M_point, F_point), &quot;Yes&quot;, &quot;No&quot;
          )
        )
      )
    )) 
    
print(df_3)` 
ID gender value team grade
A      F 25.20    A   Yes
B      M 31.10    C   Yes
C      F 21.50    B   Yes
D      F 18.60    A   Yes
E      M 27.77    A   Yes
F      M 18.52    C    No
G      F 18.52    C    No
H      M 26.30    B   Yes   

huangapple
  • 本文由 发表于 2023年7月17日 15:11:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76702188.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定