如何根据每个元素中的字符数量在R中分割列?

huangapple go评论72阅读模式
英文:

How to split a column in R based upon how many characters there are in each element?

问题

我有一个数据中的列名为"Sex/neuter status"。可能出现的值有"male"、"male entire"和"male neutered"。我想创建另一列来分离这些值,分为第一列:"sex" 和第二列:"neuter status"。对于每个情况,我希望性别出现在第一列,其绝育状态出现在第二列。对于某些情况,可能会缺少绝育状态,但每个情况都有性别记录。

示例数据:

Data = data.frame(CaseID = c('1','2','3','4','5'), sexneuterstatus =c('male','maleneutered','maleentire','male','maleneutered'))

我尝试使用拆分函数,但没有成功。

我假设我需要根据字符的数量拆分列,即,如果有2个字符,这意味着有绝育状态,因此将第二个值(绝育状态)发送到新列。但是,我不知道如何做到这一点。

请帮忙!

英文:

I have a column in my data named Sex/neuter status. The values that may appear are male, male entire, male neutered. I would like to create another column to separate this into column 1: sex and column 2: neuter status. For each case I would like the sex to appear in column 1 and its neuter status to appear in column 2. There will be some neuter status' missing for some cases but each case will have sex recorded.

Example data:

Data = data.frame(CaseID = c('1','2','3','4','5'), sexneuterstatus =c('male','maleneutered','maleentire','male','maleneutered'))                  

I have tried to use the split function but have had no luck.

I assume i need to split the column based on the number of characters, i.e., if there are 2 this means neuter status is present so send the second value (neuter status) to a new column. However, i dont know how I would do this.

Please help!

答案1

得分: 2

你可以使用 tidyr::separate 与正则表达式的后向查找,这假定男性始终在原始数据框中出现在中性状态之前,并且男性是唯一的性别,即没有女性的情况。

library(tidyr)

Data = data.frame(CaseID = c('1','2','3','4','5'), sexneuterstatus =c('male','maleneutered','maleentire','male','maleneutered'))

df_new <- 
  Data |>
  separate(col = sexneuterstatus, into = c("sex", "status"), sep = "(?<=male)")

df_new

#>   CaseID  sex   status
#> 1      1 male         
#> 2      2 male neutered
#> 3      3 male   entire
#> 4      4 male         
#> 5      5 male neutered

创建于2023年7月3日,使用 reprex v2.0.2

英文:

You can use tidyr::separate with a regex look behind, this assumes that male always precedes the neuter status in the original data frame and that male is the only gender, i.e. no cases of female.

library(tidyr)

Data = data.frame(CaseID = c(&#39;1&#39;,&#39;2&#39;,&#39;3&#39;,&#39;4&#39;,&#39;5&#39;), sexneuterstatus =c(&#39;male&#39;,&#39;maleneutered&#39;,&#39;maleentire&#39;,&#39;male&#39;,&#39;maleneutered&#39;))

df_new &lt;- 
  Data |&gt; 
  separate(col = sexneuterstatus, into = c(&quot;sex&quot;, &quot;status&quot;), sep = &quot;(?&lt;=male)&quot;)

df_new

#&gt;   CaseID  sex   status
#&gt; 1      1 male         
#&gt; 2      2 male neutered
#&gt; 3      3 male   entire
#&gt; 4      4 male         
#&gt; 5      5 male neutered

<sup>Created on 2023-07-03 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年7月3日 18:26:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603869.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定