使用apply()按名称选择特定变量

huangapple go评论100阅读模式
英文:

Using apply() to select specific variables by name

问题

这是您提供的代码部分的中文翻译:

好的,基本上我有一个家庭数据集,看起来像这样:

household_data <- data.frame(
    id = 1:4,
    gender_component_1 = c(1,2,2,2),
    gender_component_2 = c(2,1,1,2),
    bread_winner = c(1,1,2,1)
)

我想构建一个变量('gender_bread_winner'),报告家庭中的赚钱者的性别 - 无论是组件1还是2,这是以数字形式记录在单独的变量中。

我想出了以下循环:

var_max <- paste("gender_component", household_data$bread_winner, sep = "_")

for (i in 1:nrow(household_data)) {
  household_data$gender_bread_winner[i] <- select(household_data[i,], var_max[i])
}

不幸的是,真实数据集非常庞大,这个解决方案一点也不高效。我在想是否可以使用apply或类似的方法来做同样的事情?但是我一直没有成功。

提前感谢您的回答!

编辑:感谢大家的回答!最后,我发现使用ifelse的方式更容易,如下所示:

dataset$sesso_max <- NA
dataset$sesso_max <- ifelse(dataset$max_percettore == 1, dataset$sesso_1, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 2, dataset$sesso_2, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 3, dataset$sesso_3, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 4, dataset$sesso_4, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 5, dataset$sesso_5, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 6, dataset$sesso_6, dataset$sesso_max)
英文:

Ok, basically I have a dataset of households that looks like this:



household_data &lt;- data.frame(
                                id = 1:4,
                                gender_component_1 = c(1,2,2,2),
                                gender_component_2 = c(2,1,1,2),
                                bread_winner      = c(1,1,2,1)
)


I want to construct a variable ('gender_bread_winner') which reports the sex of the breadwinner in the family - whether component 1 or 2 , which is reported in a separate variable as a numeric.

I've come up with the following loop:

var_max &lt;- paste(&quot;gender_component&quot;, household_data$bread_winner, sep = &quot;_&quot;)

for (i in 1:nrow(household_data)) {
  household_data$gender_bread_winner[i] &lt;- select(household_data[i,], var_max[i])
 }

Unfortunately, the real dataset is huge and this solution is not at all optimal, I was wondering whether is it possible to do the same thing using apply or similar? I've not been able to though.

Thanks in advance

EDIT : Thank you all for your answers! In the end I found easier to use a score of ifelses like this:


dataset$sesso_max &lt;- NA
dataset$sesso_max &lt;- ifelse(dataset$max_percettore == 1, dataset$sesso_1, dataset$sesso_max)
dataset$sesso_max &lt;- ifelse(dataset$max_percettore == 2, dataset$sesso_2, dataset$sesso_max)
dataset$sesso_max &lt;- ifelse(dataset$max_percettore == 3, dataset$sesso_3, dataset$sesso_max)
dataset$sesso_max &lt;- ifelse(dataset$max_percettore == 4, dataset$sesso_4, dataset$sesso_max)
dataset$sesso_max &lt;- ifelse(dataset$max_percettore == 5, dataset$sesso_5, dataset$sesso_max)
dataset$sesso_max &lt;- ifelse(dataset$max_percettore == 6, dataset$sesso_6, dataset$sesso_max)

答案1

得分: 4

代码部分不需要翻译,以下是已翻译的内容:

如果只有2个gender_component列,可以简单使用ifelse

这段代码表示,当bread_winner的值为1时,取自gender_component_1列,否则取自gender_component_2列。


对于多于2列的情况,可以使用max.col如下所示:

gender_cols包含了所有包含"gender_component"的列。

我们创建了一个矩阵,用于从数据框household_data中进行子集选择。

这基本上表示从第一行获取第一个值,从第二行获取第一个值,依此类推。这个矩阵用于从数据框中选择数据。

英文:

If there are only 2 gender_component columns a simple ifelse would do.

household_data &lt;- transform(household_data, gender_bread_winner  = 
        ifelse(bread_winner == 1, gender_component_1, gender_component_2))

This says that when bread_winner has value 1 take the value from gender_component_1 or else take it from gender_component_2 column.


For more than 2 columns we may use max.col as follows -

gender_cols &lt;- grep(&#39;gender_component&#39;, names(household_data), value = TRUE)
household_data$gender_bread_winner &lt;- household_data[gender_cols]
             [cbind(1:nrow(household_data), household_data$bread_winner)]
household_data

#  id gender_component_1 gender_component_2 bread_winner gender_bread_winner
#1  1                  1                  2            1                   1
#2  2                  2                  1            1                   2
#3  3                  2                  1            2                   1
#4  4                  2                  2            1                   2

Explanation for the answer -

gender_cols has all the columns that have &quot;gender_component&quot; in them.

gender_cols
#[1] &quot;gender_component_1&quot; &quot;gender_component_2&quot;

We create a matrix with row and column index to subset from the dataframe household_data.

cbind(1:nrow(household_data), household_data$bread_winner)
#     [,1] [,2]
#[1,]    1    1
#[2,]    2    1
#[3,]    3    2
#[4,]    4    1

This basically says that get 1st value from 1st row, 1st value from 2nd row and so on. This matrix is used to subset the data from the dataframe.

答案2

得分: 0

One way is using mutate() function:

library(tidyverse)
household_data %>% mutate(gender_bread_winner = ifelse(bread_winner == 1, gender_component_1, gender_component_2))
    
id gender_component_1 gender_component_2 bread_winner gender_bread_winner
1  1                  1                  2            1                   1
2  2                  2                  1            1                   2
3  3                  2                  1            2                   1
4  4                  2                  2            1                   2
英文:

One way is using mutate() function:

library(tidyverse)
household_data %&gt;%  mutate(gender_bread_winner=ifelse(bread_winner==1,gender_component_1
,gender_component_2))

  id gender_component_1 gender_component_2 bread_winner gender_bread_winner
1  1                  1                  2            1                   1
2  2                  2                  1            1                   2
3  3                  2                  1            2                   1
4  4                  2                  2            1                   2

huangapple
  • 本文由 发表于 2023年3月9日 20:29:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75684639.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定