英文:
for loop or lapply (sapply, tapply...?) to extract all counts between two types of variables in a data frame in r
问题
library(tidyr)
library(dplyr)
result <- df %>%
pivot_longer(cols = starts_with("var"), names_to = "theme") %>%
group_by(theme, age, gender) %>%
summarise(count = sum(value)) %>%
pivot_wider(names_from = c("theme", "age", "gender"), values_from = "count", values_fill = 0)
英文:
I'm using R and I have a dataframe with this kind of structure:
age<-c(3,4,3,4,4,5,1)
gender<-c(1,1,2,2,1,3,2)
var1<-c(1,0,0,0,0,1,0)
var2<-c(0,0,1,1,1,0,0)
id<-c(1:7)
df<-data.frame(id, age, gender, var1, var2)
df
id age gender var1 var2
1 1 3 1 1 0
2 2 4 1 0 0
3 3 3 2 0 1
4 4 4 2 0 1
5 5 4 1 0 1
6 6 5 3 1 0
7 7 1 2 0 0
There are two types of variables in the dataframe, demographic questions such as age and gender (about 17 of these, levels varying from 3 to 13), and about 35 "thematic" questions coded 0,1 (1=certain theme present, 0=theme not present). Id is irrelevant here.
I need to wrangle this dataframe into counts of each theme by each level of each demographic question, so the resulting dataframe should be
age1 age3 age4 age5 gender1 gender2 gender3
var1 0 1 0 1 1 0 1
var2 0 1 2 0 1 2 0
I can do this manually (yes, I'm a complete idiot in programming, this is really the best I can do at this point):
library(data.table)
t<-table(df$age, df$var1)
dft<-data.frame(t)
dft<-subset(dft, dft$Var1 != 0)
dft<-dft[,2:3]
dft_t<-transpose(dft)
age_var1<-dft_t[2,]
colnames(age_var1)<-c("age1", "age3", "age4", "age5")
rownames(age_var1)<-c("var1")
And then the same for gender and var1 = "gender_var1" which I combine into one row by var1_final<-cbind(age_var1, gender_var1)
and then the same to var2, resulting in var2_final and then combining all rows with rbind.
Is there any way to do this more fluently? Thank you in advance!
答案1
得分: 2
你可以通过利用 pivot_longer()
(两次),然后再次进行宽格式的逆转来实现此目标:
pivot_longer(pivot_longer(select(df, -id), var1:var2), age:gender, names_to = "type", values_to = "v") %>%
mutate(type = paste0(type, v)) %>%
pivot_wider(id_cols = name, names_from = type, values_fn = sum, names_sort = TRUE)
输出:
name age1 age3 age4 age5 gender1 gender2 gender3
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 var1 0 1 0 1 1 0 1
2 var2 0 1 2 0 1 2 0
英文:
You can achieve this by leveraging pivot_longer()
(twice), and then pivoting back to wide format:
pivot_longer(pivot_longer(select(df,-id), var1:var2), age:gender,names_to = "type",values_to = "v") %>%
mutate(type=paste0(type,v)) %>%
pivot_wider(id_cols = name,names_from = type,values_fn = sum, names_sort=T)
Output:
name age1 age3 age4 age5 gender1 gender2 gender3
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 var1 0 1 0 1 1 0 1
2 var2 0 1 2 0 1 2 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论