英文:
Mutate new clumn based on another table with two conditions in R
问题
# 我有两个数据框。我想根据条件将新列标记为“是”,如果 `value` 大于 df2_`point`。要标记新列,需要考虑 `team` 和 `gender` 的两个条件。以下是我的数据框:
df1 <- data.frame(ID = LETTERS[1:8],
gender = c("F", "M", "F", "F", "M", "M", "F", "M"),
value = c(25.2,31.1,21.5,18.6,27.77,18.52,18.52,26.3),
team = c("A","C","B","A","A","C","C","B"))
df2 <- data.frame(team = c("A", "B", "C"),
M_point = c(17.6, 20.3, 24.5),
F_point = c(17.0, 18.2, 23.7))
# 我想要得到如下结果:
# ID gender value team grade
# 1 A F 25.20 A 是 (性别是 F,团队是 A,需要根据 df2 中的团队 A 和 F_point 17.0)
# 2 B M 31.10 C 是 (性别是 M,团队是 C,需要根据 df2 中的团队 C 和 M_point 24.5)
# 3 C F 21.50 B 是 (性别是 F,团队是 B,需要根据 df2 中的团队 B 和 F_point 18.2)
# 4 D F 18.60 A 是
# 5 E M 27.77 A 是
# 6 F M 18.52 C 否
# 7 G F 18.52 C 否
# 8 H M 26.30 B 是
要在R中实现这个,请使用以下代码。谢谢!
英文:
I have two dataframe. I want to mutate new column to label "Yes" if value
> df2_point
. To label new column, it needs to consider two conditions of team
and gender
. Here is my dateframe:
df1 <- data.frame(ID = LETTERS[1:8],
gender = c("F", "M", "F", "F", "M", "M", "F", "M"),
value = c(25.2,31.1,21.5,18.6,27.77,18.52,18.52,26.3),
team = c("A","C","B","A","A","C","C","B"))
df2 <- data.frame(team = c("A", "B", "C"),
M_point = c(17.6, 20.3, 24.5),
F_point = c(17.0, 18.2, 23.7))
And I want to get like this :
ID gender value team grade
1 A F 25.20 A Yes (geder is F and team is A, need to according to df2 team A and F_point 17.0)
2 B M 31.10 C Yes (geder is M and team is C, need to according to df2 team C and M_point 24.5)
3 C F 21.50 B Yes (geder is F and team is B, need to according to df2 team B and F_point 18.2)
4 D F 18.60 A Yes
5 E M 27.77 A Yes
6 F M 18.52 C No
7 G F 18.52 C No
8 H M 26.30 B Yes
How can I achieve this in R? Thanks.
答案1
得分: 1
你可以将df2
堆叠成长格式,然后将其合并到df1
中。
library(dplyr)
library(tidyr)
df1 %>%
left_join(
pivot_longer(df2, -team, names_to = "gender", names_pattern = "(.)", values_to = "thres"),
by = c("team", "gender")
) %>%
mutate(grade = if_else(value > thres, "Yes", "No")) %>%
select(-thres)
# ID gender value team grade
# 1 A F 25.20 A Yes
# 2 B M 31.10 C Yes
# 3 C F 21.50 B Yes
# 4 D F 18.60 A Yes
# 5 E M 27.77 A Yes
# 6 F M 18.52 C No
# 7 G F 18.52 C No
# 8 H M 26.30 B Yes
英文:
You can stack df2
to the long format, and merge it into df1
.
library(dplyr)
library(tidyr)
df1 %>%
left_join(
pivot_longer(df2, -team, names_to = "gender", names_pattern = "(.)", values_to = "thres"),
by = join_by(team, gender)
) %>%
mutate(grade = if_else(value > thres, "Yes", "No")) %>%
select(-thres)
# ID gender value team grade
# 1 A F 25.20 A Yes
# 2 B M 31.10 C Yes
# 3 C F 21.50 B Yes
# 4 D F 18.60 A Yes
# 5 E M 27.77 A Yes
# 6 F M 18.52 C No
# 7 G F 18.52 C No
# 8 H M 26.30 B Yes
答案2
得分: 1
这是一个使用数据表格进行非等值连接的示例,使用了df2
的融合(即长格式)版本。
library(data.table)
# 将 df 转换为 data.table 格式
setDT(df1); setDT(df2)
# 初始化 grade 列为 "no" 值
df1[, grade := "no"]
# 当必要时将值更新为 "yes"
df1[melt(df2, id.vars = "team")[, gender := gsub("(.).*", "\", variable)],
grade := "yes",
on = .(team, gender, value > value)][]
# ID gender value team grade
# 1: A F 25.20 A yes
# 2: B M 31.10 C yes
# 3: C F 21.50 B yes
# 4: D F 18.60 A yes
# 5: E M 27.77 A yes
# 6: F M 18.52 C no
# 7: G F 18.52 C no
# 8: H M 26.30 B yes
注意:这段代码是用R语言编写的,用于执行非等值连接操作,将df1
的"grade"列更新为"yes"或"no",具体条件是根据"team"、"gender"和"value"的值进行判断。
英文:
Here is a data.table with a non-equi join by reference, using a molten (i.e. long) version of df2
.
library(data.table)
# set df's to data.table format
setDT(df1); setDT(df2)
# initialise grade column with "no" value
df1[, grade := "no"]
# update value to yes when necessairy
df1[melt(df2, id.vars = "team")[, gender := gsub("(.).*", "\", variable)],
grade := "yes",
on = .(team, gender, value > value)][]
# ID gender value team grade
# 1: A F 25.20 A yes
# 2: B M 31.10 C yes
# 3: C F 21.50 B yes
# 4: D F 18.60 A yes
# 5: E M 27.77 A yes
# 6: F M 18.52 C no
# 7: G F 18.52 C no
# 8: H M 26.30 B yes
答案3
得分: 0
I prefer Darren Tsai's solution for its shortness. Here is another - more explicit way to do it:
library(tidyverse)
df3 <- df1 %>%
left_join(df2, by = "team") %>%
mutate(grade = ifelse(
gender == "F" & team == "A" & value > F_point, "Yes",
ifelse(
gender == "M" & team == "C" & value > M_point, "Yes",
ifelse(
gender == "F" & team == "B" & value > F_point, "Yes",
ifelse(
value > ifelse(gender == "M", M_point, F_point), "Yes", "No"
)
)
)
))
print(df_3)
ID gender value team grade
A F 25.20 A Yes
B M 31.10 C Yes
C F 21.50 B Yes
D F 18.60 A Yes
E M 27.77 A Yes
F M 18.52 C No
G F 18.52 C No
H M 26.30 B Yes
英文:
I prefer Darren Tsai's solution for it's shortness. Here is another - more explicit way to do it :
`library(tidyverse)
df3 <- df1 %>%
left_join(df2, by = "team") %>%
mutate(grade = ifelse(
gender == "F" & team == "A" & value > F_point, "Yes",
ifelse(
gender == "M" & team == "C" & value > M_point, "Yes",
ifelse(
gender == "F" & team == "B" & value > F_point, "Yes",
ifelse(
value > ifelse(gender == "M", M_point, F_point), "Yes", "No"
)
)
)
))
print(df_3)`
ID gender value team grade
A F 25.20 A Yes
B M 31.10 C Yes
C F 21.50 B Yes
D F 18.60 A Yes
E M 27.77 A Yes
F M 18.52 C No
G F 18.52 C No
H M 26.30 B Yes
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论