创建特定条件的计数表,然后在R中添加一个列,该列按整体分组计数。

huangapple go评论69阅读模式
英文:

Create count table for specific condition and then add column that creates count by group as a whole in R

问题

我有这样的表格:

data1 <- data.frame("State" = c("NJ", "NJ", "PA", "NJ", "TX"), "Filter" = c("Filter", "Filter", 
"No Filter", "Filter", "Filter"), "Threshold" = c("Exceeds","Exceeds", NA, "NL", "Exceeds"))

我想创建一个计数表格,按 State 和 Filter 分组,然后计算 Threshold 为 "Exceeds" 的次数,并将该值放入一个新列中。然后计算某个 State 和 Filter 组合出现的次数,并将其放入另一列。我想要的示例如下:

final_data <- data.frame("State" = c("NJ", "NJ", "PA", "NJ", "TX"), "Filter" = c("Filter", 
"Filter", "No Filter", "Filter", "Filter"), "Threshold" = c("Exceeds", "Exceeds", NA, "NL", 
"Exceeds"), Count_Exceeds_Threshold = c(2, 2, 0, 0, 1), Count_Total = c(3, 3, 1, 3, 1))

我尝试使用 dplyr 中的 group_by 和 tally(),但无法按照我想要的方式工作。

谢谢!

英文:

I have a table like this:

data1 &lt;- data.frame(&quot;State&quot; = c(&quot;NJ&quot;, &quot;NJ&quot;, &quot;PA&quot;, &quot;NJ&quot;, &quot;TX&quot;), &quot;Filter&quot; = c(&quot;Filter&quot;, &quot;Filter&quot;, 
&quot;No Filter&quot;, &quot;Filter&quot;, &quot;Filter&quot;), &quot;Threshold&quot; = c(&quot;Exceeds&quot;,&quot;Exceeds&quot;, NA, &quot;NL&quot;, &quot;Exceeds&quot;))

I'd like to create a count table that groups by State and Filter and then counts the number of times the Threshold is "Exceeds" and put that value in a new column. And then count the number of times a certain State and Filter combination occurs and put it in another column.
An example of what I'm looking for is below.

final_data &lt;- data.frame(&quot;State&quot; = c(&quot;NJ&quot;, &quot;NJ&quot;, &quot;PA&quot;, &quot;NJ&quot;, &quot;TX&quot;), &quot;Filter&quot; = c(&quot;Filter&quot;, 
&quot;Filter&quot;, &quot;No Filter&quot;, &quot;Filter&quot;, &quot;Filter&quot;), &quot;Threshold&quot; = c(&quot;Exceeds&quot;, &quot;Exceeds&quot;, NA, &quot;NL&quot;, 
&quot;Exceeds&quot;), Count_Exceeds_Threshold = c(2, 2, 0, 0, 1), Count_Total = c(3, 3, 1, 3, 1))

I've tried figuring this out with group_by and tally() in dplyr, but I can't get it to work the way I want.

Thank you!!

答案1

得分: 2

你可以同时使用add_count()来实现两个目的:

library(dplyr)

data1 %>%
  group_by(State, Filter) %>%
  add_count(wt = (Threshold == "Exceeds"), name = "Count_Exceeds_Threshold") %>%
  add_count(name = "Count_Total") %>%
  ungroup()

# # 一个 tibble: 5 × 5
#   State Filter    Threshold Count_Exceeds_Threshold Count_Total
#   <chr> <chr>     <chr>                       <int>       <int>
# 1 NJ    Filter    Exceeds                         2           3
# 2 NJ    Filter    Exceeds                         2           3
# 3 PA    No Filter NA                              0           1
# 4 NJ    Filter    NL                              2           3
# 5 TX    Filter    Exceeds                         1           1
英文:

You can use add_count() for both purposes:

library(dplyr)

data1 %&gt;%
  group_by(State, Filter) %&gt;%
  add_count(wt = (Threshold == &quot;Exceeds&quot;), name = &quot;Count_Exceeds_Threshold&quot;) %&gt;%
  add_count(name = &quot;Count_Total&quot;) %&gt;%
  ungroup()

# # A tibble: 5 &#215; 5
#   State Filter    Threshold Count_Exceeds_Threshold Count_Total
#   &lt;chr&gt; &lt;chr&gt;     &lt;chr&gt;                       &lt;int&gt;       &lt;int&gt;
# 1 NJ    Filter    Exceeds                         2           3
# 2 NJ    Filter    Exceeds                         2           3
# 3 PA    No Filter NA                              0           1
# 4 NJ    Filter    NL                              2           3
# 5 TX    Filter    Exceeds                         1           1

答案2

得分: 1

你可以使用mutate.by进行内联分组,并计算Threshold == "Exceeds"出现的次数。n()用于获取每个分组的行数。

library(dplyr)
data1 %>%
  mutate(Count_Exceeds_Threshold = sum(Threshold == "Exceeds", na.rm = TRUE),
         Count_Total = n(), .by = c(State, Filter))

#   State    Filter Threshold Count_Exceeds_Threshold Count_Total
# 1    NJ    Filter   Exceeds                       2           3
# 2    NJ    Filter   Exceeds                       2           3
# 3    PA No Filter      <NA>                       0           1
# 4    NJ    Filter        NL                       2           3
# 5    TX    Filter   Exceeds                       1           1
英文:

You can use mutate and .by for inline grouping, and count the number of times Threshold == &quot;Exceeds&quot;. n() is used to get the number of rows by group.

library(dplyr)
data1 %&gt;% 
  mutate(Count_Exceeds_Threshold = sum(Threshold == &quot;Exceeds&quot;, na.rm = TRUE),
         Count_Total = n(), .by = c(State, Filter))

#   State    Filter Threshold Count_Exceeds_Threshold Count_Total
# 1    NJ    Filter   Exceeds                       2           3
# 2    NJ    Filter   Exceeds                       2           3
# 3    PA No Filter      &lt;NA&gt;                       0           1
# 4    NJ    Filter        NL                       2           3
# 5    TX    Filter   Exceeds                       1           1

huangapple
  • 本文由 发表于 2023年7月27日 19:35:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76779334.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定