合并两列并设置条件?

huangapple go评论92阅读模式
英文:

Merging two columns with condition?

问题

我有一个类似这样的数据框:

  1. > dput(df)
  2. structure(list(Ethnicity = c("Non-Hispanic/Non-Latino",
  3. "Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino", NA, "Non-Hispanic/Non-Latino",
  4. "Non-Hispanic/Non-Latino", "Hispanic/Latino", "Non-Hispanic/Non-Latino",
  5. "Non-Hispanic/Non-Latino", NA), Race = structure(c(1L,
  6. 1L, 1L, NA, 5L, 1L, 7L, 1L, 7L, NA), levels = c("White", "2+ Races",
  7. "American Indian or Alaska Native", "Asian", "Black or African American",
  8. "Native Hawaiian or Other Pacific Islander", "Other", "Refused/Unknown"
  9. ), class = "factor")), row.names = c(NA, -10L), class = c("data.table",
  10. "data.frame"), .internal.selfref = <pointer: 0x7fe0098120e0>, index = integer(0))

我想要合并EthnicityRace列的信息,以便如果个体的种族是 Hispanic/Latino,则将其记录在 Race 列中。如果个体是 Non-Hispanic/Non-Latino,则不需要将该信息复制到 Race 列中。

数据框应该如下所示:

  1. > dput(r)
  2. structure(list(Ethnicity = c("Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino",
  3. "Non-Hispanic/Non-Latino", NA, "Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino",
  4. "Hispanic/Latino", "Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino",
  5. NA), Race = c("White ", "White", "White", NA, "Black or African American",
  6. "White", "Other (Hispanic/Latino)", "White", "Other", NA)), class = "data.frame", row.names = c(NA,
  7. -10L))

如您所见,第7行现在在 Race 列中包括个体是 Hispanic/Latino 的信息。

英文:

I have a dataframe that looks like this:

  1. &gt; dput(df)
  2. structure(list(Ethnicity = c(&quot;Non-Hispanic/Non-Latino&quot;,
  3. &quot;Non-Hispanic/Non-Latino&quot;, &quot;Non-Hispanic/Non-Latino&quot;, NA, &quot;Non-Hispanic/Non-Latino&quot;,
  4. &quot;Non-Hispanic/Non-Latino&quot;, &quot;Hispanic/Latino&quot;, &quot;Non-Hispanic/Non-Latino&quot;,
  5. &quot;Non-Hispanic/Non-Latino&quot;, NA), Race = structure(c(1L,
  6. 1L, 1L, NA, 5L, 1L, 7L, 1L, 7L, NA), levels = c(&quot;White&quot;, &quot;2+ Races&quot;,
  7. &quot;American Indian or Alaska Native&quot;, &quot;Asian&quot;, &quot;Black or African American&quot;,
  8. &quot;Native Hawaiian or Other Pacific Islander&quot;, &quot;Other&quot;, &quot;Refused/Unknown&quot;
  9. ), class = &quot;factor&quot;)), row.names = c(NA, -10L), class = c(&quot;data.table&quot;,
  10. &quot;data.frame&quot;), .internal.selfref = &lt;pointer: 0x7fe0098120e0&gt;, index = integer(0))

I want to combine the info in both the Ethnicity and Race columns, so that if an individual's ethnicity is Hispanic/Latino, that is recorded in the Race column. If the individual is Non-Hispanic/Non-Latino, then that information does not need to be copied into the race column.

The dataframe should look like this:

  1. &gt; dput(r)
  2. structure(list(Ethnicity = c(&quot;Non-Hispanic/Non-Latino&quot;, &quot;Non-Hispanic/Non-Latino&quot;,
  3. &quot;Non-Hispanic/Non-Latino&quot;, NA, &quot;Non-Hispanic/Non-Latino&quot;, &quot;Non-Hispanic/Non-Latino&quot;,
  4. &quot;Hispanic/Latino&quot;, &quot;Non-Hispanic/Non-Latino&quot;, &quot;Non-Hispanic/Non-Latino&quot;,
  5. NA), Race = c(&quot;White &quot;, &quot;White&quot;, &quot;White&quot;, NA, &quot;Black or African American&quot;,
  6. &quot;White&quot;, &quot;Other (Hispanic/Latino)&quot;, &quot;White&quot;, &quot;Other&quot;, NA)), class = &quot;data.frame&quot;, row.names = c(NA,
  7. -10L))

As you can see, row 7 includes that the individual was Hispanic/Latino in the Race column now.

答案1

得分: 1

由于这是一个data.table,我们可以使用data.table方法 - 使用逻辑表达式指定i并使用paste分配(:=)该值。

  1. library(data.table)
  2. df[Ethnicity == "Hispanic/Latino", Race := sprintf("%s (%s)", Race, Ethnicity)]

输出:

  1. > df
  2. Ethnicity Race
  3. 1: Non-Hispanic/Non-Latino White
  4. 2: Non-Hispanic/Non-Latino White
  5. 3: Non-Hispanic/Non-Latino White
  6. 4: <NA> <NA>
  7. 5: Non-Hispanic/Non-Latino Black or African American
  8. 6: Non-Hispanic/Non-Latino White
  9. 7: Hispanic/Latino Other (Hispanic/Latino)
  10. 8: Non-Hispanic/Non-Latino White
  11. 9: Non-Hispanic/Non-Latino Other
  12. 10: <NA> <NA>

希望这个翻译对你有帮助。

英文:

As it is a data.table, we can use data.table methods - specify the i with a logical expression and paste to assign (:=) the value

  1. library(data.table)
  2. df[Ethnicity == &quot;Hispanic/Latino&quot;, Race := sprintf(&quot;%s (%s)&quot;, Race, Ethnicity)]

-output

  1. &gt; df
  2. Ethnicity Race
  3. 1: Non-Hispanic/Non-Latino White
  4. 2: Non-Hispanic/Non-Latino White
  5. 3: Non-Hispanic/Non-Latino White
  6. 4: &lt;NA&gt; &lt;NA&gt;
  7. 5: Non-Hispanic/Non-Latino Black or African American
  8. 6: Non-Hispanic/Non-Latino White
  9. 7: Hispanic/Latino Other (Hispanic/Latino)
  10. 8: Non-Hispanic/Non-Latino White
  11. 9: Non-Hispanic/Non-Latino Other
  12. 10: &lt;NA&gt; &lt;NA&gt;

huangapple
  • 本文由 发表于 2023年2月14日 01:39:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439408.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定