2023年7月17日 15:11:24go评论102阅读模式

英文:

Mutate new clumn based on another table with two conditions in R

问题

# 我有两个数据框。我想根据条件将新列标记为“是”，如果 `value` 大于 df2_`point`。要标记新列，需要考虑 `team` 和 `gender` 的两个条件。以下是我的数据框：
df1 <- data.frame(ID = LETTERS[1:8],
                 gender = c("F", "M", "F", "F", "M", "M", "F", "M"),
                 value = c(25.2,31.1,21.5,18.6,27.77,18.52,18.52,26.3),
                 team = c("A","C","B","A","A","C","C","B"))
df2 <- data.frame(team = c("A", "B", "C"), 
                  M_point = c(17.6, 20.3, 24.5), 
                  F_point = c(17.0, 18.2, 23.7))
# 我想要得到如下结果：
#   ID gender value team   grade
# 1  A      F 25.20    A  是 (性别是 F，团队是 A，需要根据 df2 中的团队 A 和 F_point 17.0)
# 2  B      M 31.10    C  是 (性别是 M，团队是 C，需要根据 df2 中的团队 C 和 M_point 24.5)
# 3  C      F 21.50    B  是 (性别是 F，团队是 B，需要根据 df2 中的团队 B 和 F_point 18.2)
# 4  D      F 18.60    A  是
# 5  E      M 27.77    A  是
# 6  F      M 18.52    C  否
# 7  G      F 18.52    C  否
# 8  H      M 26.30    B  是

要在R中实现这个，请使用以下代码。谢谢！

英文:

I have two dataframe. I want to mutate new column to label "Yes" if value > df2_point. To label new column, it needs to consider two conditions of team and gender. Here is my dateframe:

df1 &lt;- data.frame(ID = LETTERS[1:8],
                 gender = c(&quot;F&quot;, &quot;M&quot;, &quot;F&quot;, &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;F&quot;, &quot;M&quot;),
                 value = c(25.2,31.1,21.5,18.6,27.77,18.52,18.52,26.3),
                 team = c(&quot;A&quot;,&quot;C&quot;,&quot;B&quot;,&quot;A&quot;,&quot;A&quot;,&quot;C&quot;,&quot;C&quot;,&quot;B&quot;))
df2 &lt;- data.frame(team = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), 
                  M_point = c(17.6, 20.3, 24.5), 
                  F_point = c(17.0, 18.2, 23.7))

And I want to get like this :

 ID gender value team   grade
1  A      F 25.20    A  Yes (geder is F and team is A, need to according to df2 team A and F_point 17.0)
2  B      M 31.10    C  Yes (geder is M and team is C, need to according to df2 team C and M_point 24.5)
3  C      F 21.50    B  Yes (geder is F and team is B, need to according to df2 team B and F_point 18.2)
4  D      F 18.60    A  Yes
5  E      M 27.77    A  Yes
6  F      M 18.52    C  No
7  G      F 18.52    C  No
8  H      M 26.30    B  Yes

How can I achieve this in R? Thanks.

答案1

得分: 1

你可以将df2堆叠成长格式，然后将其合并到df1中。

library(dplyr)
library(tidyr)
df1 %>%
  left_join(
    pivot_longer(df2, -team, names_to = "gender", names_pattern = "(.)", values_to = "thres"),
    by = c("team", "gender")
  ) %>%
  mutate(grade = if_else(value > thres, "Yes", "No")) %>%
  select(-thres)
#   ID gender value team grade
# 1  A      F 25.20    A   Yes
# 2  B      M 31.10    C   Yes
# 3  C      F 21.50    B   Yes
# 4  D      F 18.60    A   Yes
# 5  E      M 27.77    A   Yes
# 6  F      M 18.52    C    No
# 7  G      F 18.52    C    No
# 8  H      M 26.30    B   Yes

英文:

You can stack df2 to the long format, and merge it into df1.

library(dplyr)
library(tidyr)
df1 %&gt;%
  left_join(
    pivot_longer(df2, -team, names_to = &quot;gender&quot;, names_pattern = &quot;(.)&quot;, values_to = &quot;thres&quot;),
    by = join_by(team, gender)
  ) %&gt;%
  mutate(grade = if_else(value &gt; thres, &quot;Yes&quot;, &quot;No&quot;)) %&gt;%
  select(-thres)
#   ID gender value team grade
# 1  A      F 25.20    A   Yes
# 2  B      M 31.10    C   Yes
# 3  C      F 21.50    B   Yes
# 4  D      F 18.60    A   Yes
# 5  E      M 27.77    A   Yes
# 6  F      M 18.52    C    No
# 7  G      F 18.52    C    No
# 8  H      M 26.30    B   Yes

答案2

得分: 1

这是一个使用数据表格进行非等值连接的示例，使用了df2的融合（即长格式）版本。

library(data.table)
# 将 df 转换为 data.table 格式
setDT(df1); setDT(df2)
# 初始化 grade 列为 "no" 值
df1[, grade := "no"]
# 当必要时将值更新为 "yes"
df1[melt(df2, id.vars = "team")[, gender := gsub("(.).*", "\", variable)],
    grade := "yes",
    on = .(team, gender, value > value)][]
#    ID gender value team grade
# 1:  A      F 25.20    A   yes
# 2:  B      M 31.10    C   yes
# 3:  C      F 21.50    B   yes
# 4:  D      F 18.60    A   yes
# 5:  E      M 27.77    A   yes
# 6:  F      M 18.52    C    no
# 7:  G      F 18.52    C    no
# 8:  H      M 26.30    B   yes

注意：这段代码是用R语言编写的，用于执行非等值连接操作，将df1的"grade"列更新为"yes"或"no"，具体条件是根据"team"、"gender"和"value"的值进行判断。

英文:

Here is a data.table with a non-equi join by reference, using a molten (i.e. long) version of df2.

library(data.table)
# set df&#39;s to data.table format
setDT(df1); setDT(df2)
# initialise grade column with &quot;no&quot; value
df1[, grade := &quot;no&quot;]
# update value to yes when necessairy
df1[melt(df2, id.vars = &quot;team&quot;)[, gender := gsub(&quot;(.).*&quot;, &quot;\&quot;, variable)],
    grade := &quot;yes&quot;,
    on = .(team, gender, value &gt; value)][]
#    ID gender value team grade
# 1:  A      F 25.20    A   yes
# 2:  B      M 31.10    C   yes
# 3:  C      F 21.50    B   yes
# 4:  D      F 18.60    A   yes
# 5:  E      M 27.77    A   yes
# 6:  F      M 18.52    C    no
# 7:  G      F 18.52    C    no
# 8:  H      M 26.30    B   yes

答案3

得分: 0

I prefer Darren Tsai's solution for its shortness. Here is another - more explicit way to do it:

library(tidyverse)
df3 <- df1 %>%
    left_join(df2, by = "team") %>%
    mutate(grade = ifelse(
      gender == "F" & team == "A" & value > F_point, "Yes",
      ifelse(
        gender == "M" & team == "C" & value > M_point, "Yes",
        ifelse(
          gender == "F" & team == "B" & value > F_point, "Yes",
          ifelse(
            value > ifelse(gender == "M", M_point, F_point), "Yes", "No"
          )
        )
      )
    )) 
print(df_3)
ID gender value team grade
A      F 25.20    A   Yes
B      M 31.10    C   Yes
C      F 21.50    B   Yes
D      F 18.60    A   Yes
E      M 27.77    A   Yes
F      M 18.52    C    No
G      F 18.52    C    No
H      M 26.30    B   Yes

英文:

I prefer Darren Tsai's solution for it's shortness. Here is another - more explicit way to do it :

`library(tidyverse)
df3 &lt;- df1 %&gt;%
    left_join(df2, by = &quot;team&quot;) %&gt;%
    mutate(grade = ifelse(
      gender == &quot;F&quot; &amp; team == &quot;A&quot; &amp; value &gt; F_point, &quot;Yes&quot;,
      ifelse(
        gender == &quot;M&quot; &amp; team == &quot;C&quot; &amp; value &gt; M_point, &quot;Yes&quot;,
        ifelse(
          gender == &quot;F&quot; &amp; team == &quot;B&quot; &amp; value &gt; F_point, &quot;Yes&quot;,
          ifelse(
            value &gt; ifelse(gender == &quot;M&quot;, M_point, F_point), &quot;Yes&quot;, &quot;No&quot;
          )
        )
      )
    )) 
    
print(df_3)` 
ID gender value team grade
A      F 25.20    A   Yes
B      M 31.10    C   Yes
C      F 21.50    B   Yes
D      F 18.60    A   Yes
E      M 27.77    A   Yes
F      M 18.52    C    No
G      F 18.52    C    No
H      M 26.30    B   Yes

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中基于另一张表格和两个条件创建新列。

问题

答案1

答案2

答案3

计算连续行中仅有两列的值为0的数量。

在函数中返回多个gt表格。

循环用于创建多个列表

获取字典顺序中的前导元素。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。