英文:
Count the rows by group and get the proportion of different columns
问题
| location | TotalDiabetes | total CGM | proportion (total cgm/ total diabetes) | 
|---|---|---|---|
| CA | 2 | 1 | 0.5 | 
| TX | 3 | 1 | 0.33 | 
| AZ | 3 | 2 | 0.66 | 
英文:
I am trying to find the ratio of CGM prescribed at each location over number of diabetes patients. my actual data looks like this
| Location | Diabetes present | CGM prescribed | 
|---|---|---|
| CA | 1 | 1 | 
| TX | 1 | 0 | 
| TX | 1 | 1 | 
| CA | 1 | 0 | 
| AZ | 1 | 1 | 
| AZ | 1 | 0 | 
| AZ | 1 | 1 | 
| TX | 1 | 0 | 
Desired output:
| location | TotalDiabetes | total CGM | proportion (total cgm/ total diabetes) | 
|---|---|---|---|
| CA | 2 | 1 | 0.5 | 
| TX | 3 | 1 | 0.33 | 
| AZ | 3 | 2 | 0.66 | 
答案1
得分: 4
以下是翻译好的内容:
We may get the sum of the numeric by 'Location' and then create the proportion column by dividing the Total columns
library(dplyr) # version >= 1.1.0
library(stringr)
df1 %>%
   reframe(across(everything(), ~ sum(.x, na.rm = TRUE),
  .names = "Total_{str_remove(.col, ' .*')}"), .by = "Location") %>%
   mutate(proportion = round(Total_CGM/Total_Diabetes, 2))
-output
  Location Total_Diabetes Total_CGM proportion
1       CA              2         1       0.50
2       TX              3         1       0.33
3       AZ              3         2       0.67
Or with base R
transform(aggregate(.~ Location, df1, sum), 
  proportion = round(`CGM prescribed`/`Diabetes present`, 2), 
    check.names = FALSE)
-output
   Location Diabetes present CGM prescribed proportion
1       AZ                3              2       0.67
2       CA                2              1       0.50
3       TX                3              1       0.33
data
df1 <- structure(list(Location = c("CA", "TX", "TX", "CA", "AZ", "AZ", "AZ", "TX"), `Diabetes present` = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), `CGM prescribed` = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L)),
 class = "data.frame", row.names = c(NA, -8L))
希望这对你有所帮助。
英文:
We may get the sum of the numeric by 'Location' and then create the proportion column by dividing the Total columns
library(dplyr) # version >= 1.1.0
library(stringr)
df1 %>%
   reframe(across(everything(), ~ sum(.x, na.rm = TRUE),
  .names = "Total_{str_remove(.col, ' .*')}"), .by = "Location") %>%
   mutate(proportion = round(Total_CGM/Total_Diabetes, 2))
-output
  Location Total_Diabetes Total_CGM proportion
1       CA              2         1       0.50
2       TX              3         1       0.33
3       AZ              3         2       0.67
Or with base R
transform(aggregate(.~ Location, df1, sum), 
  proportion = round(`CGM prescribed`/`Diabetes present`, 2), 
    check.names = FALSE)
-output
   Location Diabetes present CGM prescribed proportion
1       AZ                3              2       0.67
2       CA                2              1       0.50
3       TX                3              1       0.33
data
df1 <- structure(list(Location = c("CA", "TX", "TX", "CA", "AZ", "AZ", 
"AZ", "TX"), `Diabetes present` = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), `CGM prescribed` = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L)),
 class = "data.frame", row.names = c(NA, 
-8L))
答案2
得分: 2
这是一个在data.table中的解决方案。
setnames(setDT(df1)[, lapply(.SD, sum), .(Location), .SDcols = -1][, 
            proportion := do.call(`/`, .SD), .(Location), .SDcols = 3:2],
         names(df1)[-1], paste0("Total ", sub(" .*", "", names(df1)[-1]))[]
#    Location Total Diabetes Total CGM proportion
# 1:       CA              2         1  0.5000000
# 2:       TX              3         1  0.3333333
# 3:       AZ              3         2  0.6666667
英文:
Here's a solution in data.table.
setnames(setDT(df1)[, lapply(.SD, sum), .(Location), .SDcols = -1][, 
            proportion := do.call(`/`, .SD), .(Location), .SDcols = 3:2],
         names(df1)[-1], paste0("Total ", sub(" .*", "", names(df1)[-1])))[]
#    Location Total Diabetes Total CGM proportion
# 1:       CA              2         1  0.5000000
# 2:       TX              3         1  0.3333333
# 3:       AZ              3         2  0.6666667
答案3
得分: 2
另一种使用 dplyr 的方法:
library(dplyr)
df %>%
  mutate(Location = factor(Location, levels = c("CA", "TX", "AZ"))) %>%
  group_by(Location) %>%
  summarise(TotalDiabetes = sum(Diabetes_present),
            Total_CGM = sum(CGM_prescribed),
            Proportion = Total_CGM/TotalDiabetes) 
  Location TotalDiabetes Total_CGM Proportion
  <fct>            <int>     <int>      <dbl>
1 CA                   2         1      0.5  
2 TX                   3         1      0.333
3 AZ                   3         2      0.667
英文:
Another dplyr way:
library(dplyr)
df %>% 
  mutate(Location = factor(Location, levels = c("CA", "TX", "AZ"))) %>% 
  group_by(Location) %>% 
  summarise(TotalDiabetes = sum(Diabetes_present),
            Total_CGM = sum(CGM_prescribed),
            Proportion = Total_CGM/TotalDiabetes) 
  Location TotalDiabetes Total_CGM Proportion
  <fct>            <int>     <int>      <dbl>
1 CA                   2         1      0.5  
2 TX                   3         1      0.333
3 AZ                   3         2      0.667
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论