How can I replace values in one column with values from another out of several options, when the first column contains the name of the other column?

huangapple go评论70阅读模式
英文:

How can I replace values in one column with values from another out of several options, when the first column contains the name of the other column?

问题

如果我有下面的表格。我想要替换“nameselect”列中的值,使用来自“name_01”或“name_02”的相应值。应该选择哪个列取决于“nameselect”中的原始值。因此,我希望“nameselect”只包含名称 - 在这种情况下,Ann和Claire。我也可以接受创建一个包含这些名称的新列的解决方案。我该如何最好地处理这个问题?
在我的实际数据中,有更多的名称列,所以最好不要涉及复制粘贴每个可能的列名称。我还应该提到,即使在“nameselect”中有NA值时,它也应该能够工作。

使用dplyr可以解决这个问题,但当存在NA值时无法工作:

df %>%
  rowwise() %>%
  mutate(result = get(nameselect)) %>%
  ungroup()

有没有办法调整这个解决方案,使其能够在存在NA值的情况下工作?

英文:

Say I have the table below. I want to replace the value in the column "nameselect" with the respective value from either "name_01" or "name_02". Which of these columns should be chosen is indicated by the original value in "nameselect". As a result, I want "nameselect" to contain just names - in this case, Ann and Claire. I'd also be ok with a solution creating a new column with the names. How do I best go about that?
In my actual data, there's more name columns, so ideally nothing that involves copy-pasting every possible column name. I should also mention that it should work even when there are NAs in "nameselect"

| nameselect | name_01        | name_02

| --------   | -------------- |-------------- |

| name_01    | Ann            |Bernie         |

| name_02    | Beth           |Claire         |

With dplyr this works, but not when there are NAs:

df %>% 
  rowwise() %>% 
  mutate(result = get(nameselect)) %>% 
  ungroup()

Any way I can adapt this solution to work despite NAs?

答案1

得分: 1

你可能有多列,此时只使用基本的R语言,可以使用一行代码来解决问题:

within(df, nameselect <- sapply(seq(nrow(df)), \(i) df[i, df$nameselect[i]]))
#>   nameselect name_01 name_02
#> 1        Ann     Ann  Bernie
#> 2     Claire    Beth  Claire
英文:

You may have multiple columns, in which case a one-liner general solution using only base R would be:

within(df, nameselect &lt;- sapply(seq(nrow(df)), \(i) df[i, df$nameselect[i]]))
#&gt;   nameselect name_01 name_02
#&gt; 1        Ann     Ann  Bernie
#&gt; 2     Claire    Beth  Claire

答案2

得分: 0

请看看这是否适用,考虑到您只有两列。

df[,'nameselect'] = ifelse(df[,'nameselect'] == 'name_01', df[,'name_01'], df[,'name_01'])

英文:

See if this works considering you only have two columns

df[,&#39;nameselect&#39;]=ifelse(df[,&#39;nameselect&#39;]==&#39;name_01&#39;,df[,&#39;name_01&#39;],df[,&#39;name_01&#39;])

答案3

得分: 0

Limey说得很明了:

df %>%
  mutate(
    nameselect = ifelse(nameselect == "name_01", name_01, name_02))

# 或者
df$nameselect <- ifelse(df$nameselect == "name_01", df$name_01, df$name_02)

# 或者
df %>%
  rowwise() %>%
  mutate(nameselect = get(nameselect))

# 或者
df$nameselect <- map_vec(seq_along(df$nameselect), ~ df[[df$nameselect[.x]]][.x])
英文:

As Limey said, it's quite straightforward:

df %&gt;% 
  mutate(
    nameselect = ifelse(nameselect == &quot;name_01&quot;, name_01, name_02))

# or
df$nameselect &lt;- ifelse(df$nameselect == &quot;name_01&quot;, df$name_01, df$name_02)

# or 
df %&gt;% 
  rowwise() %&gt;% 
  mutate(nameselect = get(nameselect))

# or 
df$nameselect &lt;- map_vec(seq_along(df$nameselect), ~ df[[df$nameselect[.x]]][.x])

答案4

得分: 0

基于这个解决方案,https://stackoverflow.com/questions/67678405/r-lookup-values-of-a-column-defined-by-another-columns-values-in-mutate 即使有NA值,mutate 中的 case_when 也可以使用。感谢提供的建议!

x <- switch_cols <- function(var) {
  
  vals <- unique(var)
  
  name <- deparse(substitute(var))
  
  formulae <- lapply(
    sprintf("%s == '%s' ~ %s", name, vals, vals), 
    as.formula, 
    env = parent.frame()
  )
  
  case_when(!!!formulae)
  
}

df %>%
  mutate(result = switch_cols(nameselect))
英文:

Based on this solution to a similar issue, https://stackoverflow.com/questions/67678405/r-lookup-values-of-a-column-defined-by-another-columns-values-in-mutate works with case_when even with NAs. Thanks for the suggestion!

x &lt;- switch_cols &lt;- function(var) {
  
  vals &lt;- unique(var)
  
  name &lt;- deparse(substitute(var))
  
  formulae &lt;- lapply(
    sprintf(&quot;%s == &#39;%s&#39; ~ %s&quot;, name, vals, vals), 
    as.formula, 
    env = parent.frame()
  )
  
  case_when(!!!formulae)
  
}

df %&gt;% 
  mutate(result = switch_cols(nameselect))

</details>



# 答案5
**得分**: 0

这是一个向量化的基本R方法,适用于任意数量的列。

```R
df$result <- df[cbind(seq(nrow(df)), match(df$nameselect, names(df)))]

df
#  nameselect name_01 name_02 result
#1    name_01     Ann  Bernie    Ann
#2    name_02    Beth  Claire Claire

我们创建了一个由行/列对组成的矩阵,以使用cbind从数据框中进行子集选择,其中seq(nrow(df))给出了行号,match(df$nameselect, names(df))给出了要子集选择的列号。

数据

如果您提供一个可重现的格式,将更容易提供帮助。

df <- structure(list(nameselect = c("name_01", "name_02"), name_01 = c("Ann", "Beth"), name_02 = c("Bernie", "Claire")), row.names = c(NA, -2L), class = "data.frame")
英文:

Here is vectorised base R method which will work for any number of columns.

df$result &lt;- df[cbind(seq(nrow(df)), match(df$nameselect, names(df)))]

df
#  nameselect name_01 name_02 result
#1    name_01     Ann  Bernie    Ann
#2    name_02    Beth  Claire Claire

We create a matrix of row/column pair to subset from dataframe using cbind where seq(nrow(df)) gives the row numbers and match(df$nameselect, names(df)) gives the column number to subset.

data

It is easier to help if you provide data in a reproducible format

df &lt;- structure(list(nameselect = c(&quot;name_01&quot;, &quot;name_02&quot;), name_01 = c(&quot;Ann&quot;, 
&quot;Beth&quot;), name_02 = c(&quot;Bernie&quot;, &quot;Claire&quot;)), row.names = c(NA, 
-2L), class = &quot;data.frame&quot;)

huangapple
  • 本文由 发表于 2023年7月13日 20:14:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76679264.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定