将一个因素添加到cut()函数中。

huangapple go评论92阅读模式
英文:

Adding a factor into the cut() function

问题

  1. 我有一个名为`Base`的数据框,如下所示:
  2. ID Gender Strength
  3. 1 1 0 230
  4. 2 2 1 20
  5. 3 3 1 30
  6. 4 4 0 40
  7. 5 5 0 40
  8. 我想使用`cut`函数创建一个新变量,将人们按照力量的多少进行分类,但是根据不同性别使用不同的分割点。在性别为1时,力量大于28被视为更强;在性别为0时,力量大于10被视为更强。
  9. 我可以创建一个新变量,但我不知道在哪里放入另一个向量,以便根据这两个变量创建新变量。我正在使用以下代码行,但不知道如何继续:
  10. vec1 <- Base$Strength
  11. vec2 <- Base$Gender
  12. Base$newvariable <- cut(vec1, breaks=c(0.00, 29.00, 60.00), labels=c("Stronger", "Weaker"))
英文:

I have a data frame Base such as

  1. ID Gender Strength
  2. 1 1 0 230
  3. 2 2 1 20
  4. 3 3 1 30
  5. 4 4 0 40
  6. 5 5 0 40

I want to create a new variable with the cut function to categorise people with more strength vs lower but divided by gender by different cut-off points. Cut-off point for more strength in 1 is 28 and for 0 is 10.

I can create a new variable but I don´t know where I can put the other vec for creating the variable according the two variables. I´m using this line of code but I don´t know how to go forward:

  1. vec1 &lt;- Base$Strength
  2. vec2 &lt;- Base$Gender
  3. Base$newvariable &lt;- cut(vec1, breaks=c(0.00, 29.00, 60.00), labels=c(&quot;Stronger&quot;, &quot;Weaker&quot;))

答案1

得分: 2

你可以通过cur_group()进行分组,然后使用组值。

  1. df %>%
  2. group_by(Gender) %>%
  3. mutate(newVariable = factor(Strength > (if (cur_group() == 1) 29 else 10), labels = c("Weaker", "Stronger")))
英文:

You can group by and the use the group value via cur_group()

  1. df %&gt;%
  2. group_by(Gender) %&gt;%
  3. mutate(newVariable = factor(Strength&gt;(if(cur_group()==1) 29 else 10),labels = c(&quot;Weaker&quot;, &quot;Stronger&quot;)))

答案2

得分: 0

不确定的是60,但是你需要添加maxInf

  1. transform(Base, str_cat=cut(Strength, c(0, 29, max(Strength)), labels=c('weaker', 'strong')))
  2. # ID Gender Strength str_cat
  3. # 1 1 0 230 strong
  4. # 2 2 1 20 weaker
  5. # 3 3 1 30 strong
  6. # 4 4 0 40 strong
  7. # 5 5 0 40 strong

如果60表示你想要三个分界点,那么执行以下操作:

  1. transform(Base, str_cat=cut(Strength, c(0, 29, 60, Inf), labels=c('weaker', 'normal', 'strong')))
  2. # ID Gender Strength str_cat
  3. # 1 1 0 230 strong
  4. # 2 2 1 20 weaker
  5. # 3 3 1 30 normal
  6. # 4 4 0 40 normal
  7. # 5 5 0 40 normal

数据:

  1. Base <- structure(list(ID = 1:5, Gender = c(0, 1, 1, 0, 0), Strength = c(230,
  2. 20, 30, 40, 40)), class = "data.frame", row.names = c(NA, -5L
  3. ))
英文:

Not sure with the 60, but you need to add the max or Inf.

  1. transform(Base, str_cat=cut(Strength, c(0, 29, max(Strength)), labels=c(&#39;weaker&#39;, &#39;strong&#39;)))
  2. # ID Gender Strength str_cat
  3. # 1 1 0 230 strong
  4. # 2 2 1 20 weaker
  5. # 3 3 1 30 strong
  6. # 4 4 0 40 strong
  7. # 5 5 0 40 strong

If the 60 meant you want three cuttofs, do

  1. transform(Base, str_cat=cut(Strength, c(0, 29, 60, Inf), labels=c(&#39;weaker&#39;, &#39;normal&#39;, &#39;strong&#39;)))
  2. # ID Gender Strength str_cat
  3. # 1 1 0 230 strong
  4. # 2 2 1 20 weaker
  5. # 3 3 1 30 normal
  6. # 4 4 0 40 normal
  7. # 5 5 0 40 normal

Data:

  1. Base &lt;- structure(list(ID = 1:5, Gender = c(0, 1, 1, 0, 0), Strength = c(230,
  2. 20, 30, 40, 40)), class = &quot;data.frame&quot;, row.names = c(NA, -5L
  3. ))

答案3

得分: 0

以下是翻译好的部分:

这里为每个gender组准备了不同的cut()。请注意,factor是R中用于表示分类变量的特定术语,而你的gender虚拟变量只是0和1。因此,我们可以为每个值筛选并分配特定的切割点:

  1. df <- data.frame(id = c(1,2,3,4,5),
  2. gender = c(0,1,1,0,0),
  3. strength = c(30,20,30,40,40))
  4. library(tidyverse)
  5. df %>%
  6. mutate(cut_group =
  7. ifelse(gender == 1,
  8. cut(strength, breaks=c(0.00, 20.00, 60.00), labels = c("较弱", "较强")) %>% as.character,
  9. cut(strength, breaks=c(0.00, 39.00, 60.00), labels = c("较弱", "较强")) %>% as.character)
  10. )

输出结果如下:

  1. id gender strength cut_group
  2. 1 1 0 30 较弱
  3. 2 2 1 20 较弱
  4. 3 3 1 30 较强
  5. 4 4 0 40 较强
  6. 5 5 0 40 较强

对于gender == 0,强度为30表示弱点,而对于gender == 1,强度值30表示一个强壮的人。

英文:

Here is a different cut() for each gender group. Note that factor is a specific R term for a categorical variable, whereas your gender dummy is simply 0 and 1. So we can filter for each value and assign a specific cut break:

  1. df &lt;- data.frame(id = c(1,2,3,4,5),
  2. gender = c(0,1,1,0,0),
  3. strength = c(30,20,30,40,40))
  4. library(tidyverse)
  5. df %&gt;%
  6. mutate(cut_group =
  7. ifelse(gender == 1,
  8. cut(strength, breaks=c(0.00, 20.00, 60.00), labels = c(&quot;Weaker&quot;, &quot;Stronger&quot;)) %&gt;% as.character,
  9. cut(strength, breaks=c(0.00, 39.00, 60.00), labels = c(&quot;Weaker&quot;, &quot;Stronger&quot;)) %&gt;% as.character)
  10. )
  11. id gender strength cut_group
  12. 1 1 0 30 Weaker
  13. 2 2 1 20 Weaker
  14. 3 3 1 30 Stronger
  15. 4 4 0 40 Stronger
  16. 5 5 0 40 Stronger

For gender == 0 strength of 30 indicates weakness, whereas gender == 1 strength value 30 is a strong person.

答案4

得分: 0

I usually prefer cut but since you simply have two factor levels and combinations you can consider this as well. Your conditions either return TRUE or FALSE, then convert that to a factor with the labels you want.

  1. df %>%
  2. mutate(grp = factor((gender == 0 & strength > 10) | (gender == 1 & strength > 28), levels = c(T, F), labels = c("Strong", "Weak")))
英文:

I usually prefer cut but since you simply have two factor levels and combinations you can consider this as well. Your conditions either return TRUE or FALSE, then convert that to a factor with the labels you want.

  1. df %&gt;%
  2. mutate(grp = factor((gender == 0 &amp; strength &gt; 10) | (gender == 1 &amp; strength &gt; 28), levels = c(T, F), labels = c(&quot;Strong&quot;, &quot;Weak&quot;)))

huangapple
  • 本文由 发表于 2023年3月3日 21:04:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75627440.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定