英文:
Using apply() to select specific variables by name
问题
这是您提供的代码部分的中文翻译:
好的,基本上我有一个家庭数据集,看起来像这样:
household_data <- data.frame(
id = 1:4,
gender_component_1 = c(1,2,2,2),
gender_component_2 = c(2,1,1,2),
bread_winner = c(1,1,2,1)
)
我想构建一个变量('gender_bread_winner'),报告家庭中的赚钱者的性别 - 无论是组件1还是2,这是以数字形式记录在单独的变量中。
我想出了以下循环:
var_max <- paste("gender_component", household_data$bread_winner, sep = "_")
for (i in 1:nrow(household_data)) {
household_data$gender_bread_winner[i] <- select(household_data[i,], var_max[i])
}
不幸的是,真实数据集非常庞大,这个解决方案一点也不高效。我在想是否可以使用apply或类似的方法来做同样的事情?但是我一直没有成功。
提前感谢您的回答!
编辑:感谢大家的回答!最后,我发现使用ifelse的方式更容易,如下所示:
dataset$sesso_max <- NA
dataset$sesso_max <- ifelse(dataset$max_percettore == 1, dataset$sesso_1, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 2, dataset$sesso_2, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 3, dataset$sesso_3, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 4, dataset$sesso_4, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 5, dataset$sesso_5, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 6, dataset$sesso_6, dataset$sesso_max)
英文:
Ok, basically I have a dataset of households that looks like this:
household_data <- data.frame(
id = 1:4,
gender_component_1 = c(1,2,2,2),
gender_component_2 = c(2,1,1,2),
bread_winner = c(1,1,2,1)
)
I want to construct a variable ('gender_bread_winner') which reports the sex of the breadwinner in the family - whether component 1 or 2 , which is reported in a separate variable as a numeric.
I've come up with the following loop:
var_max <- paste("gender_component", household_data$bread_winner, sep = "_")
for (i in 1:nrow(household_data)) {
household_data$gender_bread_winner[i] <- select(household_data[i,], var_max[i])
}
Unfortunately, the real dataset is huge and this solution is not at all optimal, I was wondering whether is it possible to do the same thing using apply or similar? I've not been able to though.
Thanks in advance
EDIT : Thank you all for your answers! In the end I found easier to use a score of ifelses like this:
dataset$sesso_max <- NA
dataset$sesso_max <- ifelse(dataset$max_percettore == 1, dataset$sesso_1, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 2, dataset$sesso_2, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 3, dataset$sesso_3, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 4, dataset$sesso_4, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 5, dataset$sesso_5, dataset$sesso_max)
dataset$sesso_max <- ifelse(dataset$max_percettore == 6, dataset$sesso_6, dataset$sesso_max)
答案1
得分: 4
代码部分不需要翻译,以下是已翻译的内容:
如果只有2个gender_component
列,可以简单使用ifelse
。
这段代码表示,当bread_winner
的值为1时,取自gender_component_1
列,否则取自gender_component_2
列。
对于多于2列的情况,可以使用max.col
如下所示:
gender_cols
包含了所有包含"gender_component"的列。
我们创建了一个矩阵,用于从数据框household_data
中进行子集选择。
这基本上表示从第一行获取第一个值,从第二行获取第一个值,依此类推。这个矩阵用于从数据框中选择数据。
英文:
If there are only 2 gender_component
columns a simple ifelse
would do.
household_data <- transform(household_data, gender_bread_winner =
ifelse(bread_winner == 1, gender_component_1, gender_component_2))
This says that when bread_winner
has value 1 take the value from gender_component_1
or else take it from gender_component_2
column.
For more than 2 columns we may use max.col
as follows -
gender_cols <- grep('gender_component', names(household_data), value = TRUE)
household_data$gender_bread_winner <- household_data[gender_cols]
[cbind(1:nrow(household_data), household_data$bread_winner)]
household_data
# id gender_component_1 gender_component_2 bread_winner gender_bread_winner
#1 1 1 2 1 1
#2 2 2 1 1 2
#3 3 2 1 2 1
#4 4 2 2 1 2
Explanation for the answer -
gender_cols
has all the columns that have "gender_component"
in them.
gender_cols
#[1] "gender_component_1" "gender_component_2"
We create a matrix with row and column index to subset from the dataframe household_data
.
cbind(1:nrow(household_data), household_data$bread_winner)
# [,1] [,2]
#[1,] 1 1
#[2,] 2 1
#[3,] 3 2
#[4,] 4 1
This basically says that get 1st value from 1st row, 1st value from 2nd row and so on. This matrix is used to subset the data from the dataframe.
答案2
得分: 0
One way is using mutate()
function:
library(tidyverse)
household_data %>% mutate(gender_bread_winner = ifelse(bread_winner == 1, gender_component_1, gender_component_2))
id gender_component_1 gender_component_2 bread_winner gender_bread_winner
1 1 1 2 1 1
2 2 2 1 1 2
3 3 2 1 2 1
4 4 2 2 1 2
英文:
One way is using mutate()
function:
library(tidyverse)
household_data %>% mutate(gender_bread_winner=ifelse(bread_winner==1,gender_component_1
,gender_component_2))
id gender_component_1 gender_component_2 bread_winner gender_bread_winner
1 1 1 2 1 1
2 2 2 1 1 2
3 3 2 1 2 1
4 4 2 2 1 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论