英文:
How to split a column in R based upon how many characters there are in each element?
问题
我有一个数据中的列名为"Sex/neuter status"。可能出现的值有"male"、"male entire"和"male neutered"。我想创建另一列来分离这些值,分为第一列:"sex" 和第二列:"neuter status"。对于每个情况,我希望性别出现在第一列,其绝育状态出现在第二列。对于某些情况,可能会缺少绝育状态,但每个情况都有性别记录。
示例数据:
Data = data.frame(CaseID = c('1','2','3','4','5'), sexneuterstatus =c('male','maleneutered','maleentire','male','maleneutered'))
我尝试使用拆分函数,但没有成功。
我假设我需要根据字符的数量拆分列,即,如果有2个字符,这意味着有绝育状态,因此将第二个值(绝育状态)发送到新列。但是,我不知道如何做到这一点。
请帮忙!
英文:
I have a column in my data named Sex/neuter status. The values that may appear are male, male entire, male neutered. I would like to create another column to separate this into column 1: sex and column 2: neuter status. For each case I would like the sex to appear in column 1 and its neuter status to appear in column 2. There will be some neuter status' missing for some cases but each case will have sex recorded.
Example data:
Data = data.frame(CaseID = c('1','2','3','4','5'), sexneuterstatus =c('male','maleneutered','maleentire','male','maleneutered'))
I have tried to use the split function but have had no luck.
I assume i need to split the column based on the number of characters, i.e., if there are 2 this means neuter status is present so send the second value (neuter status) to a new column. However, i dont know how I would do this.
Please help!
答案1
得分: 2
你可以使用 tidyr::separate
与正则表达式的后向查找,这假定男性始终在原始数据框中出现在中性状态之前,并且男性是唯一的性别,即没有女性的情况。
library(tidyr)
Data = data.frame(CaseID = c('1','2','3','4','5'), sexneuterstatus =c('male','maleneutered','maleentire','male','maleneutered'))
df_new <-
Data |>
separate(col = sexneuterstatus, into = c("sex", "status"), sep = "(?<=male)")
df_new
#> CaseID sex status
#> 1 1 male
#> 2 2 male neutered
#> 3 3 male entire
#> 4 4 male
#> 5 5 male neutered
创建于2023年7月3日,使用 reprex v2.0.2
英文:
You can use tidyr::separate
with a regex look behind, this assumes that male always precedes the neuter status in the original data frame and that male is the only gender, i.e. no cases of female.
library(tidyr)
Data = data.frame(CaseID = c('1','2','3','4','5'), sexneuterstatus =c('male','maleneutered','maleentire','male','maleneutered'))
df_new <-
Data |>
separate(col = sexneuterstatus, into = c("sex", "status"), sep = "(?<=male)")
df_new
#> CaseID sex status
#> 1 1 male
#> 2 2 male neutered
#> 3 3 male entire
#> 4 4 male
#> 5 5 male neutered
<sup>Created on 2023-07-03 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论