2023年7月13日 20:14:33go评论94阅读模式

英文:

How can I replace values in one column with values from another out of several options, when the first column contains the name of the other column?

问题

如果我有下面的表格。我想要替换“nameselect”列中的值，使用来自“name_01”或“name_02”的相应值。应该选择哪个列取决于“nameselect”中的原始值。因此，我希望“nameselect”只包含名称 - 在这种情况下，Ann和Claire。我也可以接受创建一个包含这些名称的新列的解决方案。我该如何最好地处理这个问题？
在我的实际数据中，有更多的名称列，所以最好不要涉及复制粘贴每个可能的列名称。我还应该提到，即使在“nameselect”中有NA值时，它也应该能够工作。

使用dplyr可以解决这个问题，但当存在NA值时无法工作：

df %>%
  rowwise() %>%
  mutate(result = get(nameselect)) %>%
  ungroup()

有没有办法调整这个解决方案，使其能够在存在NA值的情况下工作？

英文:

Say I have the table below. I want to replace the value in the column "nameselect" with the respective value from either "name_01" or "name_02". Which of these columns should be chosen is indicated by the original value in "nameselect". As a result, I want "nameselect" to contain just names - in this case, Ann and Claire. I'd also be ok with a solution creating a new column with the names. How do I best go about that?
In my actual data, there's more name columns, so ideally nothing that involves copy-pasting every possible column name. I should also mention that it should work even when there are NAs in "nameselect"

| nameselect | name_01        | name_02
| --------   | -------------- |-------------- |
| name_01    | Ann            |Bernie         |
| name_02    | Beth           |Claire         |

With dplyr this works, but not when there are NAs:

df %&gt;% 
  rowwise() %&gt;% 
  mutate(result = get(nameselect)) %&gt;% 
  ungroup()

Any way I can adapt this solution to work despite NAs?

答案1

得分: 1

你可能有多列，此时只使用基本的R语言，可以使用一行代码来解决问题：

within(df, nameselect <- sapply(seq(nrow(df)), \(i) df[i, df$nameselect[i]]))
#>   nameselect name_01 name_02
#> 1        Ann     Ann  Bernie
#> 2     Claire    Beth  Claire

英文:

You may have multiple columns, in which case a one-liner general solution using only base R would be:

within(df, nameselect &lt;- sapply(seq(nrow(df)), \(i) df[i, df$nameselect[i]]))
#&gt;   nameselect name_01 name_02
#&gt; 1        Ann     Ann  Bernie
#&gt; 2     Claire    Beth  Claire

答案2

得分: 0

请看看这是否适用，考虑到您只有两列。

df[,'nameselect'] = ifelse(df[,'nameselect'] == 'name_01', df[,'name_01'], df[,'name_01'])

英文:

See if this works considering you only have two columns

df[,'nameselect']=ifelse(df[,'nameselect']=='name_01',df[,'name_01'],df[,'name_01'])

答案3

得分: 0

Limey说得很明了：

df %>%
  mutate(
    nameselect = ifelse(nameselect == "name_01", name_01, name_02))
# 或者
df$nameselect <- ifelse(df$nameselect == "name_01", df$name_01, df$name_02)
# 或者
df %>%
  rowwise() %>%
  mutate(nameselect = get(nameselect))
# 或者
df$nameselect <- map_vec(seq_along(df$nameselect), ~ df[[df$nameselect[.x]]][.x])

英文:

As Limey said, it's quite straightforward:

df %&gt;% 
  mutate(
    nameselect = ifelse(nameselect == &quot;name_01&quot;, name_01, name_02))
# or
df$nameselect &lt;- ifelse(df$nameselect == &quot;name_01&quot;, df$name_01, df$name_02)
# or 
df %&gt;% 
  rowwise() %&gt;% 
  mutate(nameselect = get(nameselect))
# or 
df$nameselect &lt;- map_vec(seq_along(df$nameselect), ~ df[[df$nameselect[.x]]][.x])

答案4

得分: 0

基于这个解决方案，https://stackoverflow.com/questions/67678405/r-lookup-values-of-a-column-defined-by-another-columns-values-in-mutate 即使有NA值，mutate 中的 case_when 也可以使用。感谢提供的建议！

x <- switch_cols <- function(var) {
  
  vals <- unique(var)
  
  name <- deparse(substitute(var))
  
  formulae <- lapply(
    sprintf("%s == '%s' ~ %s", name, vals, vals), 
    as.formula, 
    env = parent.frame()
  )
  
  case_when(!!!formulae)
  
}
df %>%
  mutate(result = switch_cols(nameselect))

英文:

Based on this solution to a similar issue, https://stackoverflow.com/questions/67678405/r-lookup-values-of-a-column-defined-by-another-columns-values-in-mutate works with case_when even with NAs. Thanks for the suggestion!

x &lt;- switch_cols &lt;- function(var) {
  
  vals &lt;- unique(var)
  
  name &lt;- deparse(substitute(var))
  
  formulae &lt;- lapply(
    sprintf(&quot;%s == &#39;%s&#39; ~ %s&quot;, name, vals, vals), 
    as.formula, 
    env = parent.frame()
  )
  
  case_when(!!!formulae)
  
}
df %&gt;% 
  mutate(result = switch_cols(nameselect))
</details>
# 答案5
**得分**: 0
这是一个向量化的基本R方法，适用于任意数量的列。
```R
df$result <- df[cbind(seq(nrow(df)), match(df$nameselect, names(df)))]
df
#  nameselect name_01 name_02 result
#1    name_01     Ann  Bernie    Ann
#2    name_02    Beth  Claire Claire

我们创建了一个由行/列对组成的矩阵，以使用cbind从数据框中进行子集选择，其中seq(nrow(df))给出了行号，match(df$nameselect, names(df))给出了要子集选择的列号。

数据

如果您提供一个可重现的格式，将更容易提供帮助。

df <- structure(list(nameselect = c("name_01", "name_02"), name_01 = c("Ann", "Beth"), name_02 = c("Bernie", "Claire")), row.names = c(NA, -2L), class = "data.frame")

英文:

Here is vectorised base R method which will work for any number of columns.

df$result &lt;- df[cbind(seq(nrow(df)), match(df$nameselect, names(df)))]
df
#  nameselect name_01 name_02 result
#1    name_01     Ann  Bernie    Ann
#2    name_02    Beth  Claire Claire

We create a matrix of row/column pair to subset from dataframe using cbind where seq(nrow(df)) gives the row numbers and match(df$nameselect, names(df)) gives the column number to subset.

data

It is easier to help if you provide data in a reproducible format

df &lt;- structure(list(nameselect = c(&quot;name_01&quot;, &quot;name_02&quot;), name_01 = c(&quot;Ann&quot;, 
&quot;Beth&quot;), name_02 = c(&quot;Bernie&quot;, &quot;Claire&quot;)), row.names = c(NA, 
-2L), class = &quot;data.frame&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How can I replace values in one column with values from another out of several options, when the first column contains the name of the other column?

问题

答案1

答案2

答案3

答案4

如何使用因子间隔创建直方图？

pandas：为满足条件的行添加一个分组变量的集群

如何检查行趋势并将失败案例的差异和差异百分比分别添加到单独的列中

如何使用特定条件连接不同的数据框？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论