按组计算行数并获取不同列的比例

huangapple go评论63阅读模式
英文:

Count the rows by group and get the proportion of different columns

问题

location TotalDiabetes total CGM proportion (total cgm/ total diabetes)
CA 2 1 0.5
TX 3 1 0.33
AZ 3 2 0.66
英文:

I am trying to find the ratio of CGM prescribed at each location over number of diabetes patients. my actual data looks like this

Location Diabetes present CGM prescribed
CA 1 1
TX 1 0
TX 1 1
CA 1 0
AZ 1 1
AZ 1 0
AZ 1 1
TX 1 0

Desired output:

location TotalDiabetes total CGM proportion (total cgm/ total diabetes)
CA 2 1 0.5
TX 3 1 0.33
AZ 3 2 0.66

答案1

得分: 4

以下是翻译好的内容:

We may get the sum of the numeric by 'Location' and then create the proportion column by dividing the Total columns

library(dplyr) # version >= 1.1.0
library(stringr)
df1 %>%
   reframe(across(everything(), ~ sum(.x, na.rm = TRUE),
  .names = "Total_{str_remove(.col, ' .*')}"), .by = "Location") %>%
   mutate(proportion = round(Total_CGM/Total_Diabetes, 2))

-output

  Location Total_Diabetes Total_CGM proportion
1       CA              2         1       0.50
2       TX              3         1       0.33
3       AZ              3         2       0.67

Or with base R

transform(aggregate(.~ Location, df1, sum), 
  proportion = round(`CGM prescribed`/`Diabetes present`, 2), 
    check.names = FALSE)

-output

   Location Diabetes present CGM prescribed proportion
1       AZ                3              2       0.67
2       CA                2              1       0.50
3       TX                3              1       0.33

data

df1 <- structure(list(Location = c("CA", "TX", "TX", "CA", "AZ", "AZ", "AZ", "TX"), `Diabetes present` = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), `CGM prescribed` = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L)),
 class = "data.frame", row.names = c(NA, -8L))

希望这对你有所帮助。

英文:

We may get the sum of the numeric by 'Location' and then create the proportion column by dividing the Total columns

library(dplyr) # version &gt;= 1.1.0
library(stringr)
df1 %&gt;%
   reframe(across(everything(), ~ sum(.x, na.rm = TRUE),
  .names = &quot;Total_{str_remove(.col, &#39; .*&#39;)}&quot;), .by = &quot;Location&quot;) %&gt;%
   mutate(proportion = round(Total_CGM/Total_Diabetes, 2))

-output

  Location Total_Diabetes Total_CGM proportion
1       CA              2         1       0.50
2       TX              3         1       0.33
3       AZ              3         2       0.67

Or with base R

transform(aggregate(.~ Location, df1, sum), 
  proportion = round(`CGM prescribed`/`Diabetes present`, 2), 
    check.names = FALSE)

-output

   Location Diabetes present CGM prescribed proportion
1       AZ                3              2       0.67
2       CA                2              1       0.50
3       TX                3              1       0.33

data

df1 &lt;- structure(list(Location = c(&quot;CA&quot;, &quot;TX&quot;, &quot;TX&quot;, &quot;CA&quot;, &quot;AZ&quot;, &quot;AZ&quot;, 
&quot;AZ&quot;, &quot;TX&quot;), `Diabetes present` = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), `CGM prescribed` = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L)),
 class = &quot;data.frame&quot;, row.names = c(NA, 
-8L))

答案2

得分: 2

这是一个在data.table中的解决方案。

setnames(setDT(df1)[, lapply(.SD, sum), .(Location), .SDcols = -1][, 
            proportion := do.call(`/`, .SD), .(Location), .SDcols = 3:2],
         names(df1)[-1], paste0("Total ", sub(" .*", "", names(df1)[-1]))[]

#    Location Total Diabetes Total CGM proportion
# 1:       CA              2         1  0.5000000
# 2:       TX              3         1  0.3333333
# 3:       AZ              3         2  0.6666667
英文:

Here's a solution in data.table.

setnames(setDT(df1)[, lapply(.SD, sum), .(Location), .SDcols = -1][, 
            proportion := do.call(`/`, .SD), .(Location), .SDcols = 3:2],
         names(df1)[-1], paste0(&quot;Total &quot;, sub(&quot; .*&quot;, &quot;&quot;, names(df1)[-1])))[]

#    Location Total Diabetes Total CGM proportion
# 1:       CA              2         1  0.5000000
# 2:       TX              3         1  0.3333333
# 3:       AZ              3         2  0.6666667

答案3

得分: 2

另一种使用 dplyr 的方法:

library(dplyr)

df %>%
  mutate(Location = factor(Location, levels = c("CA", "TX", "AZ"))) %>%
  group_by(Location) %>%
  summarise(TotalDiabetes = sum(Diabetes_present),
            Total_CGM = sum(CGM_prescribed),
            Proportion = Total_CGM/TotalDiabetes) 
  Location TotalDiabetes Total_CGM Proportion
  <fct>            <int>     <int>      <dbl>
1 CA                   2         1      0.5  
2 TX                   3         1      0.333
3 AZ                   3         2      0.667
英文:

Another dplyr way:

library(dplyr)

df %&gt;% 
  mutate(Location = factor(Location, levels = c(&quot;CA&quot;, &quot;TX&quot;, &quot;AZ&quot;))) %&gt;% 
  group_by(Location) %&gt;% 
  summarise(TotalDiabetes = sum(Diabetes_present),
            Total_CGM = sum(CGM_prescribed),
            Proportion = Total_CGM/TotalDiabetes) 

  Location TotalDiabetes Total_CGM Proportion
  &lt;fct&gt;            &lt;int&gt;     &lt;int&gt;      &lt;dbl&gt;
1 CA                   2         1      0.5  
2 TX                   3         1      0.333
3 AZ                   3         2      0.667

huangapple
  • 本文由 发表于 2023年2月24日 02:40:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75549021.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定