英文:
Merging two columns with condition?
问题
我有一个类似这样的数据框:
> dput(df)
structure(list(Ethnicity = c("Non-Hispanic/Non-Latino",
"Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino", NA, "Non-Hispanic/Non-Latino",
"Non-Hispanic/Non-Latino", "Hispanic/Latino", "Non-Hispanic/Non-Latino",
"Non-Hispanic/Non-Latino", NA), Race = structure(c(1L,
1L, 1L, NA, 5L, 1L, 7L, 1L, 7L, NA), levels = c("White", "2+ Races",
"American Indian or Alaska Native", "Asian", "Black or African American",
"Native Hawaiian or Other Pacific Islander", "Other", "Refused/Unknown"
), class = "factor")), row.names = c(NA, -10L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x7fe0098120e0>, index = integer(0))
我想要合并Ethnicity
和Race
列的信息,以便如果个体的种族是 Hispanic/Latino,则将其记录在 Race 列中。如果个体是 Non-Hispanic/Non-Latino,则不需要将该信息复制到 Race 列中。
数据框应该如下所示:
> dput(r)
structure(list(Ethnicity = c("Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino",
"Non-Hispanic/Non-Latino", NA, "Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino",
"Hispanic/Latino", "Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino",
NA), Race = c("White ", "White", "White", NA, "Black or African American",
"White", "Other (Hispanic/Latino)", "White", "Other", NA)), class = "data.frame", row.names = c(NA,
-10L))
如您所见,第7行现在在 Race 列中包括个体是 Hispanic/Latino 的信息。
英文:
I have a dataframe that looks like this:
> dput(df)
structure(list(Ethnicity = c("Non-Hispanic/Non-Latino",
"Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino", NA, "Non-Hispanic/Non-Latino",
"Non-Hispanic/Non-Latino", "Hispanic/Latino", "Non-Hispanic/Non-Latino",
"Non-Hispanic/Non-Latino", NA), Race = structure(c(1L,
1L, 1L, NA, 5L, 1L, 7L, 1L, 7L, NA), levels = c("White", "2+ Races",
"American Indian or Alaska Native", "Asian", "Black or African American",
"Native Hawaiian or Other Pacific Islander", "Other", "Refused/Unknown"
), class = "factor")), row.names = c(NA, -10L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x7fe0098120e0>, index = integer(0))
I want to combine the info in both the Ethnicity
and Race
columns, so that if an individual's ethnicity is Hispanic/Latino, that is recorded in the Race column. If the individual is Non-Hispanic/Non-Latino, then that information does not need to be copied into the race column.
The dataframe should look like this:
> dput(r)
structure(list(Ethnicity = c("Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino",
"Non-Hispanic/Non-Latino", NA, "Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino",
"Hispanic/Latino", "Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino",
NA), Race = c("White ", "White", "White", NA, "Black or African American",
"White", "Other (Hispanic/Latino)", "White", "Other", NA)), class = "data.frame", row.names = c(NA,
-10L))
As you can see, row 7 includes that the individual was Hispanic/Latino in the Race column now.
答案1
得分: 1
由于这是一个data.table,我们可以使用data.table
方法 - 使用逻辑表达式指定i
并使用paste
分配(:=
)该值。
library(data.table)
df[Ethnicity == "Hispanic/Latino", Race := sprintf("%s (%s)", Race, Ethnicity)]
输出:
> df
Ethnicity Race
1: Non-Hispanic/Non-Latino White
2: Non-Hispanic/Non-Latino White
3: Non-Hispanic/Non-Latino White
4: <NA> <NA>
5: Non-Hispanic/Non-Latino Black or African American
6: Non-Hispanic/Non-Latino White
7: Hispanic/Latino Other (Hispanic/Latino)
8: Non-Hispanic/Non-Latino White
9: Non-Hispanic/Non-Latino Other
10: <NA> <NA>
希望这个翻译对你有帮助。
英文:
As it is a data.table, we can use data.table
methods - specify the i
with a logical expression and paste
to assign (:=
) the value
library(data.table)
df[Ethnicity == "Hispanic/Latino", Race := sprintf("%s (%s)", Race, Ethnicity)]
-output
> df
Ethnicity Race
1: Non-Hispanic/Non-Latino White
2: Non-Hispanic/Non-Latino White
3: Non-Hispanic/Non-Latino White
4: <NA> <NA>
5: Non-Hispanic/Non-Latino Black or African American
6: Non-Hispanic/Non-Latino White
7: Hispanic/Latino Other (Hispanic/Latino)
8: Non-Hispanic/Non-Latino White
9: Non-Hispanic/Non-Latino Other
10: <NA> <NA>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论