for loop or lapply (sapply, tapply…?) to extract all counts between two types of variables in a data frame in r

huangapple go评论63阅读模式
英文:

for loop or lapply (sapply, tapply...?) to extract all counts between two types of variables in a data frame in r

问题

library(tidyr)
library(dplyr)

result <- df %>%
  pivot_longer(cols = starts_with("var"), names_to = "theme") %>%
  group_by(theme, age, gender) %>%
  summarise(count = sum(value)) %>%
  pivot_wider(names_from = c("theme", "age", "gender"), values_from = "count", values_fill = 0)
英文:

I'm using R and I have a dataframe with this kind of structure:

age&lt;-c(3,4,3,4,4,5,1)
gender&lt;-c(1,1,2,2,1,3,2)
var1&lt;-c(1,0,0,0,0,1,0)
var2&lt;-c(0,0,1,1,1,0,0)
id&lt;-c(1:7)
df&lt;-data.frame(id, age, gender, var1, var2)
df
  id age gender var1 var2
1  1   3      1    1    0
2  2   4      1    0    0
3  3   3      2    0    1
4  4   4      2    0    1
5  5   4      1    0    1
6  6   5      3    1    0
7  7   1      2    0    0

There are two types of variables in the dataframe, demographic questions such as age and gender (about 17 of these, levels varying from 3 to 13), and about 35 "thematic" questions coded 0,1 (1=certain theme present, 0=theme not present). Id is irrelevant here.

I need to wrangle this dataframe into counts of each theme by each level of each demographic question, so the resulting dataframe should be

      age1 age3 age4 age5 gender1 gender2 gender3 
var1  0    1    0    1    1       0       1
var2  0    1    2    0    1       2       0

I can do this manually (yes, I'm a complete idiot in programming, this is really the best I can do at this point):

library(data.table)

t&lt;-table(df$age, df$var1)
dft&lt;-data.frame(t)
dft&lt;-subset(dft, dft$Var1 != 0)
dft&lt;-dft[,2:3]
dft_t&lt;-transpose(dft)
age_var1&lt;-dft_t[2,]
colnames(age_var1)&lt;-c(&quot;age1&quot;, &quot;age3&quot;, &quot;age4&quot;, &quot;age5&quot;)
rownames(age_var1)&lt;-c(&quot;var1&quot;)

And then the same for gender and var1 = "gender_var1" which I combine into one row by var1_final&lt;-cbind(age_var1, gender_var1)

and then the same to var2, resulting in var2_final and then combining all rows with rbind.

Is there any way to do this more fluently? Thank you in advance!

答案1

得分: 2

你可以通过利用 pivot_longer()(两次),然后再次进行宽格式的逆转来实现此目标:

pivot_longer(pivot_longer(select(df, -id), var1:var2), age:gender, names_to = "type", values_to = "v") %>%
  mutate(type = paste0(type, v)) %>%
  pivot_wider(id_cols = name, names_from = type, values_fn = sum, names_sort = TRUE)

输出:

  name   age1  age3  age4  age5 gender1 gender2 gender3
  <chr>  <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
1 var1       0     1     0     1       1       0       1
2 var2       0     1     2     0       1       2       0
英文:

You can achieve this by leveraging pivot_longer() (twice), and then pivoting back to wide format:

pivot_longer(pivot_longer(select(df,-id), var1:var2), age:gender,names_to = &quot;type&quot;,values_to = &quot;v&quot;) %&gt;% 
  mutate(type=paste0(type,v)) %&gt;%
  pivot_wider(id_cols = name,names_from = type,values_fn = sum, names_sort=T)

Output:

  name   age1  age3  age4  age5 gender1 gender2 gender3
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
1 var1      0     1     0     1       1       0       1
2 var2      0     1     2     0       1       2       0

huangapple
  • 本文由 发表于 2023年4月20日 01:54:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76057509.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定