英文:
How can I replace values in one column with values from another out of several options, when the first column contains the name of the other column?
问题
如果我有下面的表格。我想要替换“nameselect”列中的值,使用来自“name_01”或“name_02”的相应值。应该选择哪个列取决于“nameselect”中的原始值。因此,我希望“nameselect”只包含名称 - 在这种情况下,Ann和Claire。我也可以接受创建一个包含这些名称的新列的解决方案。我该如何最好地处理这个问题?
在我的实际数据中,有更多的名称列,所以最好不要涉及复制粘贴每个可能的列名称。我还应该提到,即使在“nameselect”中有NA值时,它也应该能够工作。
使用dplyr可以解决这个问题,但当存在NA值时无法工作:
df %>%
rowwise() %>%
mutate(result = get(nameselect)) %>%
ungroup()
有没有办法调整这个解决方案,使其能够在存在NA值的情况下工作?
英文:
Say I have the table below. I want to replace the value in the column "nameselect" with the respective value from either "name_01" or "name_02". Which of these columns should be chosen is indicated by the original value in "nameselect". As a result, I want "nameselect" to contain just names - in this case, Ann and Claire. I'd also be ok with a solution creating a new column with the names. How do I best go about that?
In my actual data, there's more name columns, so ideally nothing that involves copy-pasting every possible column name. I should also mention that it should work even when there are NAs in "nameselect"
| nameselect | name_01 | name_02
| -------- | -------------- |-------------- |
| name_01 | Ann |Bernie |
| name_02 | Beth |Claire |
With dplyr this works, but not when there are NAs:
df %>%
rowwise() %>%
mutate(result = get(nameselect)) %>%
ungroup()
Any way I can adapt this solution to work despite NAs?
答案1
得分: 1
你可能有多列,此时只使用基本的R语言,可以使用一行代码来解决问题:
within(df, nameselect <- sapply(seq(nrow(df)), \(i) df[i, df$nameselect[i]]))
#> nameselect name_01 name_02
#> 1 Ann Ann Bernie
#> 2 Claire Beth Claire
英文:
You may have multiple columns, in which case a one-liner general solution using only base R would be:
within(df, nameselect <- sapply(seq(nrow(df)), \(i) df[i, df$nameselect[i]]))
#> nameselect name_01 name_02
#> 1 Ann Ann Bernie
#> 2 Claire Beth Claire
答案2
得分: 0
请看看这是否适用,考虑到您只有两列。
df[,'nameselect'] = ifelse(df[,'nameselect'] == 'name_01', df[,'name_01'], df[,'name_01'])
英文:
See if this works considering you only have two columns
df[,'nameselect']=ifelse(df[,'nameselect']=='name_01',df[,'name_01'],df[,'name_01'])
答案3
得分: 0
Limey说得很明了:
df %>%
mutate(
nameselect = ifelse(nameselect == "name_01", name_01, name_02))
# 或者
df$nameselect <- ifelse(df$nameselect == "name_01", df$name_01, df$name_02)
# 或者
df %>%
rowwise() %>%
mutate(nameselect = get(nameselect))
# 或者
df$nameselect <- map_vec(seq_along(df$nameselect), ~ df[[df$nameselect[.x]]][.x])
英文:
As Limey said, it's quite straightforward:
df %>%
mutate(
nameselect = ifelse(nameselect == "name_01", name_01, name_02))
# or
df$nameselect <- ifelse(df$nameselect == "name_01", df$name_01, df$name_02)
# or
df %>%
rowwise() %>%
mutate(nameselect = get(nameselect))
# or
df$nameselect <- map_vec(seq_along(df$nameselect), ~ df[[df$nameselect[.x]]][.x])
答案4
得分: 0
基于这个解决方案,https://stackoverflow.com/questions/67678405/r-lookup-values-of-a-column-defined-by-another-columns-values-in-mutate 即使有NA值,mutate
中的 case_when
也可以使用。感谢提供的建议!
x <- switch_cols <- function(var) {
vals <- unique(var)
name <- deparse(substitute(var))
formulae <- lapply(
sprintf("%s == '%s' ~ %s", name, vals, vals),
as.formula,
env = parent.frame()
)
case_when(!!!formulae)
}
df %>%
mutate(result = switch_cols(nameselect))
英文:
Based on this solution to a similar issue, https://stackoverflow.com/questions/67678405/r-lookup-values-of-a-column-defined-by-another-columns-values-in-mutate works with case_when even with NAs. Thanks for the suggestion!
x <- switch_cols <- function(var) {
vals <- unique(var)
name <- deparse(substitute(var))
formulae <- lapply(
sprintf("%s == '%s' ~ %s", name, vals, vals),
as.formula,
env = parent.frame()
)
case_when(!!!formulae)
}
df %>%
mutate(result = switch_cols(nameselect))
</details>
# 答案5
**得分**: 0
这是一个向量化的基本R方法,适用于任意数量的列。
```R
df$result <- df[cbind(seq(nrow(df)), match(df$nameselect, names(df)))]
df
# nameselect name_01 name_02 result
#1 name_01 Ann Bernie Ann
#2 name_02 Beth Claire Claire
我们创建了一个由行/列对组成的矩阵,以使用cbind
从数据框中进行子集选择,其中seq(nrow(df))
给出了行号,match(df$nameselect, names(df))
给出了要子集选择的列号。
数据
如果您提供一个可重现的格式,将更容易提供帮助。
df <- structure(list(nameselect = c("name_01", "name_02"), name_01 = c("Ann", "Beth"), name_02 = c("Bernie", "Claire")), row.names = c(NA, -2L), class = "data.frame")
英文:
Here is vectorised base R method which will work for any number of columns.
df$result <- df[cbind(seq(nrow(df)), match(df$nameselect, names(df)))]
df
# nameselect name_01 name_02 result
#1 name_01 Ann Bernie Ann
#2 name_02 Beth Claire Claire
We create a matrix of row/column pair to subset from dataframe using cbind
where seq(nrow(df))
gives the row numbers and match(df$nameselect, names(df))
gives the column number to subset.
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(nameselect = c("name_01", "name_02"), name_01 = c("Ann",
"Beth"), name_02 = c("Bernie", "Claire")), row.names = c(NA,
-2L), class = "data.frame")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论