将一个因素添加到cut()函数中。

huangapple go评论74阅读模式
英文:

Adding a factor into the cut() function

问题

我有一个名为`Base`的数据框,如下所示:

      ID Gender Strength
    1  1      0      230
    2  2      1       20
    3  3      1       30
    4  4      0       40
    5  5      0       40

我想使用`cut`函数创建一个新变量,将人们按照力量的多少进行分类,但是根据不同性别使用不同的分割点。在性别为1时,力量大于28被视为更强;在性别为0时,力量大于10被视为更强。

我可以创建一个新变量,但我不知道在哪里放入另一个向量,以便根据这两个变量创建新变量。我正在使用以下代码行,但不知道如何继续:

    vec1 <- Base$Strength
    vec2 <- Base$Gender
    Base$newvariable <- cut(vec1, breaks=c(0.00, 29.00, 60.00), labels=c("Stronger", "Weaker"))
英文:

I have a data frame Base such as

  ID Gender Strength
1  1      0      230
2  2      1       20
3  3      1       30
4  4      0       40
5  5      0       40

I want to create a new variable with the cut function to categorise people with more strength vs lower but divided by gender by different cut-off points. Cut-off point for more strength in 1 is 28 and for 0 is 10.

I can create a new variable but I don´t know where I can put the other vec for creating the variable according the two variables. I´m using this line of code but I don´t know how to go forward:

vec1 &lt;- Base$Strength
vec2 &lt;- Base$Gender
Base$newvariable &lt;- cut(vec1, breaks=c(0.00, 29.00, 60.00), labels=c(&quot;Stronger&quot;, &quot;Weaker&quot;))

答案1

得分: 2

你可以通过cur_group()进行分组,然后使用组值。

df %>%
  group_by(Gender) %>%
  mutate(newVariable = factor(Strength > (if (cur_group() == 1) 29 else 10), labels = c("Weaker", "Stronger")))
英文:

You can group by and the use the group value via cur_group()

df %&gt;% 
  group_by(Gender) %&gt;% 
  mutate(newVariable = factor(Strength&gt;(if(cur_group()==1) 29 else 10),labels = c(&quot;Weaker&quot;, &quot;Stronger&quot;))) 

答案2

得分: 0

不确定的是60,但是你需要添加maxInf

transform(Base, str_cat=cut(Strength, c(0, 29, max(Strength)), labels=c('weaker', 'strong')))
#   ID Gender Strength str_cat
# 1  1      0      230  strong
# 2  2      1       20  weaker
# 3  3      1       30  strong
# 4  4      0       40  strong
# 5  5      0       40  strong

如果60表示你想要三个分界点,那么执行以下操作:

transform(Base, str_cat=cut(Strength, c(0, 29, 60, Inf), labels=c('weaker', 'normal', 'strong')))
#   ID Gender Strength str_cat
# 1  1      0      230  strong
# 2  2      1       20  weaker
# 3  3      1       30  normal
# 4  4      0       40  normal
# 5  5      0       40  normal

数据:

Base <- structure(list(ID = 1:5, Gender = c(0, 1, 1, 0, 0), Strength = c(230, 
20, 30, 40, 40)), class = "data.frame", row.names = c(NA, -5L
))
英文:

Not sure with the 60, but you need to add the max or Inf.

transform(Base, str_cat=cut(Strength, c(0, 29, max(Strength)), labels=c(&#39;weaker&#39;, &#39;strong&#39;)))
#   ID Gender Strength str_cat
# 1  1      0      230  strong
# 2  2      1       20  weaker
# 3  3      1       30  strong
# 4  4      0       40  strong
# 5  5      0       40  strong

If the 60 meant you want three cuttofs, do

transform(Base, str_cat=cut(Strength, c(0, 29, 60, Inf), labels=c(&#39;weaker&#39;, &#39;normal&#39;, &#39;strong&#39;)))
#   ID Gender Strength str_cat
# 1  1      0      230  strong
# 2  2      1       20  weaker
# 3  3      1       30  normal
# 4  4      0       40  normal
# 5  5      0       40  normal

Data:

Base &lt;- structure(list(ID = 1:5, Gender = c(0, 1, 1, 0, 0), Strength = c(230, 
20, 30, 40, 40)), class = &quot;data.frame&quot;, row.names = c(NA, -5L
))

答案3

得分: 0

以下是翻译好的部分:

这里为每个gender组准备了不同的cut()。请注意,factor是R中用于表示分类变量的特定术语,而你的gender虚拟变量只是0和1。因此,我们可以为每个值筛选并分配特定的切割点:

df <- data.frame(id = c(1,2,3,4,5),
                 gender = c(0,1,1,0,0),
                 strength = c(30,20,30,40,40))

library(tidyverse)

df %>% 
  mutate(cut_group = 
           ifelse(gender == 1, 
                  cut(strength, breaks=c(0.00, 20.00, 60.00), labels = c("较弱", "较强")) %>% as.character,
                  cut(strength, breaks=c(0.00, 39.00, 60.00), labels = c("较弱", "较强")) %>% as.character)
  )

输出结果如下:

  id gender strength cut_group
1  1      0       30      较弱
2  2      1       20      较弱
3  3      1       30      较强
4  4      0       40      较强
5  5      0       40      较强

对于gender == 0,强度为30表示弱点,而对于gender == 1,强度值30表示一个强壮的人。

英文:

Here is a different cut() for each gender group. Note that factor is a specific R term for a categorical variable, whereas your gender dummy is simply 0 and 1. So we can filter for each value and assign a specific cut break:

df &lt;- data.frame(id = c(1,2,3,4,5),
                 gender = c(0,1,1,0,0),
                 strength = c(30,20,30,40,40))

library(tidyverse)

df %&gt;% 
  mutate(cut_group = 
           ifelse(gender == 1, 
                  cut(strength, breaks=c(0.00, 20.00, 60.00), labels = c(&quot;Weaker&quot;, &quot;Stronger&quot;)) %&gt;% as.character,
                  cut(strength, breaks=c(0.00, 39.00, 60.00), labels = c(&quot;Weaker&quot;, &quot;Stronger&quot;)) %&gt;% as.character)
  )

  id gender strength cut_group
1  1      0       30    Weaker
2  2      1       20    Weaker
3  3      1       30  Stronger
4  4      0       40  Stronger
5  5      0       40  Stronger

For gender == 0 strength of 30 indicates weakness, whereas gender == 1 strength value 30 is a strong person.

答案4

得分: 0

I usually prefer cut but since you simply have two factor levels and combinations you can consider this as well. Your conditions either return TRUE or FALSE, then convert that to a factor with the labels you want.

df %>%
  mutate(grp = factor((gender == 0 & strength > 10) | (gender == 1 & strength > 28), levels = c(T, F), labels = c("Strong", "Weak")))
英文:

I usually prefer cut but since you simply have two factor levels and combinations you can consider this as well. Your conditions either return TRUE or FALSE, then convert that to a factor with the labels you want.

df %&gt;%
  mutate(grp = factor((gender == 0 &amp; strength &gt; 10) | (gender == 1 &amp; strength &gt; 28), levels = c(T, F), labels = c(&quot;Strong&quot;, &quot;Weak&quot;)))

huangapple
  • 本文由 发表于 2023年3月3日 21:04:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75627440.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定