2023年4月20日 01:54:37go评论80阅读模式

英文:

for loop or lapply (sapply, tapply...?) to extract all counts between two types of variables in a data frame in r

问题

library(tidyr)
library(dplyr)

result <- df %>%
  pivot_longer(cols = starts_with("var"), names_to = "theme") %>%
  group_by(theme, age, gender) %>%
  summarise(count = sum(value)) %>%
  pivot_wider(names_from = c("theme", "age", "gender"), values_from = "count", values_fill = 0)

英文:

I'm using R and I have a dataframe with this kind of structure:

age&lt;-c(3,4,3,4,4,5,1)
gender&lt;-c(1,1,2,2,1,3,2)
var1&lt;-c(1,0,0,0,0,1,0)
var2&lt;-c(0,0,1,1,1,0,0)
id&lt;-c(1:7)
df&lt;-data.frame(id, age, gender, var1, var2)
df
  id age gender var1 var2
1  1   3      1    1    0
2  2   4      1    0    0
3  3   3      2    0    1
4  4   4      2    0    1
5  5   4      1    0    1
6  6   5      3    1    0
7  7   1      2    0    0

There are two types of variables in the dataframe, demographic questions such as age and gender (about 17 of these, levels varying from 3 to 13), and about 35 "thematic" questions coded 0,1 (1=certain theme present, 0=theme not present). Id is irrelevant here.

I need to wrangle this dataframe into counts of each theme by each level of each demographic question, so the resulting dataframe should be

      age1 age3 age4 age5 gender1 gender2 gender3 
var1  0    1    0    1    1       0       1
var2  0    1    2    0    1       2       0

I can do this manually (yes, I'm a complete idiot in programming, this is really the best I can do at this point):

library(data.table)

t&lt;-table(df$age, df$var1)
dft&lt;-data.frame(t)
dft&lt;-subset(dft, dft$Var1 != 0)
dft&lt;-dft[,2:3]
dft_t&lt;-transpose(dft)
age_var1&lt;-dft_t[2,]
colnames(age_var1)&lt;-c(&quot;age1&quot;, &quot;age3&quot;, &quot;age4&quot;, &quot;age5&quot;)
rownames(age_var1)&lt;-c(&quot;var1&quot;)

And then the same for gender and var1 = "gender_var1" which I combine into one row by var1_final<-cbind(age_var1, gender_var1)

and then the same to var2, resulting in var2_final and then combining all rows with rbind.

Is there any way to do this more fluently? Thank you in advance!

答案1

得分: 2

你可以通过利用 pivot_longer()（两次），然后再次进行宽格式的逆转来实现此目标：

pivot_longer(pivot_longer(select(df, -id), var1:var2), age:gender, names_to = "type", values_to = "v") %>%
  mutate(type = paste0(type, v)) %>%
  pivot_wider(id_cols = name, names_from = type, values_fn = sum, names_sort = TRUE)

输出：

  name   age1  age3  age4  age5 gender1 gender2 gender3
  <chr>  <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
1 var1       0     1     0     1       1       0       1
2 var2       0     1     2     0       1       2       0

英文:

You can achieve this by leveraging pivot_longer() (twice), and then pivoting back to wide format:

pivot_longer(pivot_longer(select(df,-id), var1:var2), age:gender,names_to = &quot;type&quot;,values_to = &quot;v&quot;) %&gt;% 
  mutate(type=paste0(type,v)) %&gt;%
  pivot_wider(id_cols = name,names_from = type,values_fn = sum, names_sort=T)

Output:

  name   age1  age3  age4  age5 gender1 gender2 gender3
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
1 var1      0     1     0     1       1       0       1
2 var2      0     1     2     0       1       2       0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

for loop or lapply (sapply, tapply…?) to extract all counts between two types of variables in a data frame in r

问题

答案1

透视表按行总计

如何在闭包内重新定义 ‘+’？

修改映射条目的值，而无需替换整个映射条目。

使用lapply函数在构建带有多个条件的复杂列表时是否值得代替for循环？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论