英文:
Adding a factor into the cut() function
问题
我有一个名为`Base`的数据框,如下所示:
ID Gender Strength
1 1 0 230
2 2 1 20
3 3 1 30
4 4 0 40
5 5 0 40
我想使用`cut`函数创建一个新变量,将人们按照力量的多少进行分类,但是根据不同性别使用不同的分割点。在性别为1时,力量大于28被视为更强;在性别为0时,力量大于10被视为更强。
我可以创建一个新变量,但我不知道在哪里放入另一个向量,以便根据这两个变量创建新变量。我正在使用以下代码行,但不知道如何继续:
vec1 <- Base$Strength
vec2 <- Base$Gender
Base$newvariable <- cut(vec1, breaks=c(0.00, 29.00, 60.00), labels=c("Stronger", "Weaker"))
英文:
I have a data frame Base
such as
ID Gender Strength
1 1 0 230
2 2 1 20
3 3 1 30
4 4 0 40
5 5 0 40
I want to create a new variable with the cut function to categorise people with more strength vs lower but divided by gender by different cut-off points. Cut-off point for more strength in 1 is 28 and for 0 is 10.
I can create a new variable but I don´t know where I can put the other vec for creating the variable according the two variables. I´m using this line of code but I don´t know how to go forward:
vec1 <- Base$Strength
vec2 <- Base$Gender
Base$newvariable <- cut(vec1, breaks=c(0.00, 29.00, 60.00), labels=c("Stronger", "Weaker"))
答案1
得分: 2
你可以通过cur_group()
进行分组,然后使用组值。
df %>%
group_by(Gender) %>%
mutate(newVariable = factor(Strength > (if (cur_group() == 1) 29 else 10), labels = c("Weaker", "Stronger")))
英文:
You can group by and the use the group value via cur_group()
df %>%
group_by(Gender) %>%
mutate(newVariable = factor(Strength>(if(cur_group()==1) 29 else 10),labels = c("Weaker", "Stronger")))
答案2
得分: 0
不确定的是60
,但是你需要添加max
或Inf
。
transform(Base, str_cat=cut(Strength, c(0, 29, max(Strength)), labels=c('weaker', 'strong')))
# ID Gender Strength str_cat
# 1 1 0 230 strong
# 2 2 1 20 weaker
# 3 3 1 30 strong
# 4 4 0 40 strong
# 5 5 0 40 strong
如果60
表示你想要三个分界点,那么执行以下操作:
transform(Base, str_cat=cut(Strength, c(0, 29, 60, Inf), labels=c('weaker', 'normal', 'strong')))
# ID Gender Strength str_cat
# 1 1 0 230 strong
# 2 2 1 20 weaker
# 3 3 1 30 normal
# 4 4 0 40 normal
# 5 5 0 40 normal
数据:
Base <- structure(list(ID = 1:5, Gender = c(0, 1, 1, 0, 0), Strength = c(230,
20, 30, 40, 40)), class = "data.frame", row.names = c(NA, -5L
))
英文:
Not sure with the 60
, but you need to add the max
or Inf
.
transform(Base, str_cat=cut(Strength, c(0, 29, max(Strength)), labels=c('weaker', 'strong')))
# ID Gender Strength str_cat
# 1 1 0 230 strong
# 2 2 1 20 weaker
# 3 3 1 30 strong
# 4 4 0 40 strong
# 5 5 0 40 strong
If the 60
meant you want three cuttofs, do
transform(Base, str_cat=cut(Strength, c(0, 29, 60, Inf), labels=c('weaker', 'normal', 'strong')))
# ID Gender Strength str_cat
# 1 1 0 230 strong
# 2 2 1 20 weaker
# 3 3 1 30 normal
# 4 4 0 40 normal
# 5 5 0 40 normal
Data:
Base <- structure(list(ID = 1:5, Gender = c(0, 1, 1, 0, 0), Strength = c(230,
20, 30, 40, 40)), class = "data.frame", row.names = c(NA, -5L
))
答案3
得分: 0
以下是翻译好的部分:
这里为每个gender
组准备了不同的cut()
。请注意,factor
是R中用于表示分类变量的特定术语,而你的gender
虚拟变量只是0和1。因此,我们可以为每个值筛选并分配特定的切割点:
df <- data.frame(id = c(1,2,3,4,5),
gender = c(0,1,1,0,0),
strength = c(30,20,30,40,40))
library(tidyverse)
df %>%
mutate(cut_group =
ifelse(gender == 1,
cut(strength, breaks=c(0.00, 20.00, 60.00), labels = c("较弱", "较强")) %>% as.character,
cut(strength, breaks=c(0.00, 39.00, 60.00), labels = c("较弱", "较强")) %>% as.character)
)
输出结果如下:
id gender strength cut_group
1 1 0 30 较弱
2 2 1 20 较弱
3 3 1 30 较强
4 4 0 40 较强
5 5 0 40 较强
对于gender == 0
,强度为30表示弱点,而对于gender == 1
,强度值30表示一个强壮的人。
英文:
Here is a different cut()
for each gender
group. Note that factor
is a specific R term for a categorical variable, whereas your gender
dummy is simply 0 and 1. So we can filter for each value and assign a specific cut break:
df <- data.frame(id = c(1,2,3,4,5),
gender = c(0,1,1,0,0),
strength = c(30,20,30,40,40))
library(tidyverse)
df %>%
mutate(cut_group =
ifelse(gender == 1,
cut(strength, breaks=c(0.00, 20.00, 60.00), labels = c("Weaker", "Stronger")) %>% as.character,
cut(strength, breaks=c(0.00, 39.00, 60.00), labels = c("Weaker", "Stronger")) %>% as.character)
)
id gender strength cut_group
1 1 0 30 Weaker
2 2 1 20 Weaker
3 3 1 30 Stronger
4 4 0 40 Stronger
5 5 0 40 Stronger
For gender == 0
strength of 30 indicates weakness, whereas gender == 1
strength value 30 is a strong person.
答案4
得分: 0
I usually prefer cut but since you simply have two factor levels and combinations you can consider this as well. Your conditions either return TRUE or FALSE, then convert that to a factor with the labels you want.
df %>%
mutate(grp = factor((gender == 0 & strength > 10) | (gender == 1 & strength > 28), levels = c(T, F), labels = c("Strong", "Weak")))
英文:
I usually prefer cut but since you simply have two factor levels and combinations you can consider this as well. Your conditions either return TRUE or FALSE, then convert that to a factor with the labels you want.
df %>%
mutate(grp = factor((gender == 0 & strength > 10) | (gender == 1 & strength > 28), levels = c(T, F), labels = c("Strong", "Weak")))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论