2023年2月24日 02:40:55go评论89阅读模式

英文:

Count the rows by group and get the proportion of different columns

问题

location	TotalDiabetes	total CGM	proportion (total cgm/ total diabetes)
CA	2	1	0.5
TX	3	1	0.33
AZ	3	2	0.66

英文:

I am trying to find the ratio of CGM prescribed at each location over number of diabetes patients. my actual data looks like this

Location	Diabetes present	CGM prescribed
CA	1	1
TX	1	0
TX	1	1
CA	1	0
AZ	1	1
AZ	1	0
AZ	1	1
TX	1	0

Desired output:

location	TotalDiabetes	total CGM	proportion (total cgm/ total diabetes)
CA	2	1	0.5
TX	3	1	0.33
AZ	3	2	0.66

答案1

得分: 4

以下是翻译好的内容：

We may get the sum of the numeric by 'Location' and then create the proportion column by dividing the Total columns

library(dplyr) # version >= 1.1.0
library(stringr)
df1 %>%
   reframe(across(everything(), ~ sum(.x, na.rm = TRUE),
  .names = "Total_{str_remove(.col, ' .*')}"), .by = "Location") %>%
   mutate(proportion = round(Total_CGM/Total_Diabetes, 2))

-output

  Location Total_Diabetes Total_CGM proportion
1       CA              2         1       0.50
2       TX              3         1       0.33
3       AZ              3         2       0.67

Or with base R

transform(aggregate(.~ Location, df1, sum), 
  proportion = round(`CGM prescribed`/`Diabetes present`, 2), 
    check.names = FALSE)

-output

   Location Diabetes present CGM prescribed proportion
1       AZ                3              2       0.67
2       CA                2              1       0.50
3       TX                3              1       0.33

data

df1 <- structure(list(Location = c("CA", "TX", "TX", "CA", "AZ", "AZ", "AZ", "TX"), `Diabetes present` = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), `CGM prescribed` = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L)),
 class = "data.frame", row.names = c(NA, -8L))

希望这对你有所帮助。

英文:

We may get the sum of the numeric by 'Location' and then create the proportion column by dividing the Total columns

library(dplyr) # version &gt;= 1.1.0
library(stringr)
df1 %&gt;%
   reframe(across(everything(), ~ sum(.x, na.rm = TRUE),
  .names = &quot;Total_{str_remove(.col, &#39; .*&#39;)}&quot;), .by = &quot;Location&quot;) %&gt;%
   mutate(proportion = round(Total_CGM/Total_Diabetes, 2))

-output

  Location Total_Diabetes Total_CGM proportion
1       CA              2         1       0.50
2       TX              3         1       0.33
3       AZ              3         2       0.67

Or with base R

transform(aggregate(.~ Location, df1, sum), 
  proportion = round(`CGM prescribed`/`Diabetes present`, 2), 
    check.names = FALSE)

-output

   Location Diabetes present CGM prescribed proportion
1       AZ                3              2       0.67
2       CA                2              1       0.50
3       TX                3              1       0.33

data

df1 &lt;- structure(list(Location = c(&quot;CA&quot;, &quot;TX&quot;, &quot;TX&quot;, &quot;CA&quot;, &quot;AZ&quot;, &quot;AZ&quot;, 
&quot;AZ&quot;, &quot;TX&quot;), `Diabetes present` = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), `CGM prescribed` = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L)),
 class = &quot;data.frame&quot;, row.names = c(NA, 
-8L))

答案2

得分: 2

这是一个在data.table中的解决方案。

setnames(setDT(df1)[, lapply(.SD, sum), .(Location), .SDcols = -1][, 
            proportion := do.call(`/`, .SD), .(Location), .SDcols = 3:2],
         names(df1)[-1], paste0("Total ", sub(" .*", "", names(df1)[-1]))[]

#    Location Total Diabetes Total CGM proportion
# 1:       CA              2         1  0.5000000
# 2:       TX              3         1  0.3333333
# 3:       AZ              3         2  0.6666667

英文:

Here's a solution in data.table.

setnames(setDT(df1)[, lapply(.SD, sum), .(Location), .SDcols = -1][, 
            proportion := do.call(`/`, .SD), .(Location), .SDcols = 3:2],
         names(df1)[-1], paste0(&quot;Total &quot;, sub(&quot; .*&quot;, &quot;&quot;, names(df1)[-1])))[]

#    Location Total Diabetes Total CGM proportion
# 1:       CA              2         1  0.5000000
# 2:       TX              3         1  0.3333333
# 3:       AZ              3         2  0.6666667

答案3

得分: 2

另一种使用 dplyr 的方法：

library(dplyr)

df %>%
  mutate(Location = factor(Location, levels = c("CA", "TX", "AZ"))) %>%
  group_by(Location) %>%
  summarise(TotalDiabetes = sum(Diabetes_present),
            Total_CGM = sum(CGM_prescribed),
            Proportion = Total_CGM/TotalDiabetes)

  Location TotalDiabetes Total_CGM Proportion
  <fct>            <int>     <int>      <dbl>
1 CA                   2         1      0.5  
2 TX                   3         1      0.333
3 AZ                   3         2      0.667

英文:

Another dplyr way:

library(dplyr)

df %&gt;% 
  mutate(Location = factor(Location, levels = c(&quot;CA&quot;, &quot;TX&quot;, &quot;AZ&quot;))) %&gt;% 
  group_by(Location) %&gt;% 
  summarise(TotalDiabetes = sum(Diabetes_present),
            Total_CGM = sum(CGM_prescribed),
            Proportion = Total_CGM/TotalDiabetes)

  Location TotalDiabetes Total_CGM Proportion
  &lt;fct&gt;            &lt;int&gt;     &lt;int&gt;      &lt;dbl&gt;
1 CA                   2         1      0.5  
2 TX                   3         1      0.333
3 AZ                   3         2      0.667

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按组计算行数并获取不同列的比例

问题

答案1

data

data

答案2

答案3

将数据框的列值转换为多个YAML文件

R ggplot标签每个有序小提琴的观察数量，使用facet wrap。

将多列中的值扩展为二进制值

Subsetting a long-data.table using values of a column within the data.table and casting the other values

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论