如何在多个数据框上使用多个条件进行求和

huangapple go评论74阅读模式
英文:

How to sum on multiple criteria using multiple data frames

问题

我有这个数据框,从中提取了条件:

id 条件 1 条件 2
1 3/1/2022 黑色
2 5/2/2022 黑色
3 3/1/2022 蓝色

我想要使用这些条件来从以下数据框中汇总金额:

id 日期 颜色 金额
1 3/1/2022 黑色 15
2 5/2/2022 红色 10
3 3/1/2022 蓝色 25
4 5/2/2022 红色 10
5 4/1/2022 黑色 15
6 3/1/2022 黑色 45

我希望它看起来像这样:

id 条件 1 条件 2 金额
1 3/1/2022 黑色 60
2 5/2/2022 黑色 0
3 3/1/2022 蓝色 25
英文:

I have this df where I pull the criteria from:

id Criteria 1 Criteria 2
1 3/1/2022 Black
2 5/2/2022 Black
3 3/1/2022 Blue

I want to use the criteria to sum amounts from the following df:

id Date Color Amount
1 3/1/2022 Black 15
2 5/2/2022 Red 10
3 3/1/2022 Blue 25
4 5/2/2022 Red 10
5 4/1/2022 Black 15
6 3/1/2022 Black 45

I want it to look like this:

id Criteria 1 Criteria 2 Amount
1 3/1/2022 Black 60
2 5/2/2022 Black 0
3 3/1/2022 Blue 25

答案1

得分: 2

我怀疑你可能需要执行join > group_by > summarise(sum)。请分享数据。

库(dplyr)

df1 %>%
    left_join(df2, by = c('Criteria1' = 'Date', 'Criteria2' = 'Color')) %>%
    group_by(Criteria1, Criteria2) %>%
    summarise(Amount = sum(Amount, na.rm = TRUE))
          
# 一个 tibble: 3 × 3
# 组:   Criteria1 [2]
  Criteria1 Criteria2 Amount
  <chr>     <chr>      <int>
1 3/1/2022  Black         60
2 3/1/2022  Blue          25
3 5/2/2022  Black          0
英文:

I suspect you may need to join &gt; group_by &gt; summarise(sum). Please share the data.

library(dplyr)

df1 %&gt;%
    left_join(df2, by = c(&#39;Criteria1&#39; = &#39;Date&#39;, &#39;Criteria2&#39; = &#39;Color&#39;)) %&gt;%
    group_by(Criteria1, Criteria2) %&gt;%
    summarise(Amount = sum(Amount, na.rm = TRUE))
          
# A tibble: 3 &#215; 3
# Groups:   Criteria1 [2]
  Criteria1 Criteria2 Amount
  &lt;chr&gt;     &lt;chr&gt;      &lt;int&gt;
1 3/1/2022  Black         60
2 3/1/2022  Blue          25
3 5/2/2022  Black          0

</details>



# 答案2
**得分**: 1

```markdown
其他答案都很好,但是这里提供了一个使用基本R和`dplyr`结合的替代方法,可能对一些人更容易理解:

```R
xx <- df2 %>%
  group_by(Date, Color) %>%
  summarize(Amount = sum(Amount))

xy <- merge(df1, xx,
      by.x = c("Criteria1", "Criteria2"), 
      by.y = c("Date", "Color"), 
      all.x = TRUE)

xy[is.na(xy)] <- 0
xy <- xy[order(xy$id), c(3,1:2,4)]
#   id Criteria1 Criteria2 Amount
# 1  1  3/1/2022     Black     60
# 3  2  5/2/2022     Black      0
# 2  3  3/1/2022      Blue     25

数据

df1 <- read.table(text = "id Criteria1 Criteria2
1 3/1/2022 Black
2 5/2/2022 Black
3 3/1/2022 Blue", header = TRUE)

df2 <- read.table(text = "id Date Color Amount
1 3/1/2022 Black 15
2 5/2/2022 Red 10
3 3/1/2022 Blue 25
4 5/2/2022 Red 10
5 4/1/2022 Black 15
6 3/1/2022 Black 45", header = TRUE)
英文:

The other answers are excellent, but an alternative using a combination of base R and dplyr in case it is easier to understand for some folks:

xx &lt;- df2 %&gt;%  # summarize the data with `Amount`
  group_by(Date, Color) %&gt;%
  summarize(Amount = sum(Amount))

xy &lt;- merge(df1, xx, # merge with df1
      by.x = c(&quot;Criteria1&quot;, &quot;Criteria2&quot;), 
      by.y = c(&quot;Date&quot;, &quot;Color&quot;), 
      all.x = TRUE)

xy[is.na(xy)] &lt;- 0 # replace NAs with 0
xy &lt;- xy[order(xy$id), c(3,1:2,4)] # sort and reorder (may be superfluous)

#   id Criteria1 Criteria2 Amount
# 1  1  3/1/2022     Black     60
# 3  2  5/2/2022     Black      0
# 2  3  3/1/2022      Blue     25

Data

df1 &lt;- read.table(text = &quot;id	Criteria1	Criteria2
1	3/1/2022	Black
2	5/2/2022	Black
3	3/1/2022	Blue&quot;, header = TRUE)

df2 &lt;- read.table(text = &quot;id	Date	Color	Amount
1	3/1/2022	Black	15
2	5/2/2022	Red	10
3	3/1/2022	Blue	25
4	5/2/2022	Red	10
5	4/1/2022	Black	15
6	3/1/2022	Black	45&quot;, header = TRUE)

答案3

得分: 1

以下是代码的翻译部分:

merge(df1[-1], df2[-1], 
      by.x = c("Criteria1", "Criteria2"),
      by.y = c("Date", "Color"),
      all.x = TRUE) |>
  aggregate(Amount ~ Criteria1 + Criteria2, data = _, FUN = sum, na.action = na.pass, na.rm = TRUE)
#>   Criteria1 Criteria2 Amount
#> 1  3/1/2022     Black     60
#> 2  5/2/2022     Black      0
#> 3  3/1/2022      Blue     25

Data

df1 <- "id  Criteria1   Criteria2
1   3/1/2022    Black
2   5/2/2022    Black
3   3/1/2022    Blue";
df1 <- read.table(text = df1, header = TRUE)
df2 <- "id  Date    Color   Amount
1   3/1/2022    Black   15
2   5/2/2022    Red     10
3   3/1/2022    Blue    25
4   5/2/2022    Red     10
5   4/1/2022    Black   15
6   3/1/2022    Black   45";
df2 <- read.table(text = df2, header = TRUE)

希望这对您有所帮助。如果您需要进一步的帮助,请随时告诉我。

英文:

Here is a base R solution with merge and aggregate.

merge(df1[-1], df2[-1], 
      by.x = c(&quot;Criteria1&quot;, &quot;Criteria2&quot;),
      by.y = c(&quot;Date&quot;, &quot;Color&quot;),
      all.x = TRUE) |&gt;
  aggregate(Amount ~ Criteria1 + Criteria2, data = _, FUN = sum, na.action = na.pass, na.rm = TRUE)
#&gt;   Criteria1 Criteria2 Amount
#&gt; 1  3/1/2022     Black     60
#&gt; 2  5/2/2022     Black      0
#&gt; 3  3/1/2022      Blue     25

<sup>Created on 2023-02-23 with reprex v2.0.2</sup>


Data

df1 &lt;- &quot;id  Criteria1   Criteria2
1   3/1/2022    Black
2   5/2/2022    Black
3   3/1/2022    Blue&quot;
df1 &lt;- read.table(text = df1, header = TRUE)
df2 &lt;- &quot;id  Date    Color   Amount
1   3/1/2022    Black   15
2   5/2/2022    Red     10
3   3/1/2022    Blue    25
4   5/2/2022    Red     10
5   4/1/2022    Black   15
6   3/1/2022    Black   45&quot;
df2 &lt;- read.table(text = df2, header = TRUE)

<sup>Created on 2023-02-23 with reprex v2.0.2</sup>

答案4

得分: 1

使用 data.table

library(data.table)
setDT(df1)[, Amount := df2[.SD, sum(Amount), 
  on = .(Date = Criteria1, Color = Criteria2), by = .EACHI]$V1]

-output

> df1[is.na(Amount), Amount  := 0]
> df1
   id Criteria1 Criteria2 Amount
1:  1  3/1/2022     Black     60
2:  2  5/2/2022     Black      0
3:  3  3/1/2022      Blue     25

请注意,这些是代码示例,不需要翻译。

英文:

Using data.table

library(data.table)
setDT(df1)[, Amount := df2[.SD, sum(Amount), 
  on = .(Date = Criteria1, Color = Criteria2), by = .EACHI]$V1]

-output

&gt; df1[is.na(Amount), Amount  := 0]
&gt; df1
   id Criteria1 Criteria2 Amount
1:  1  3/1/2022     Black     60
2:  2  5/2/2022     Black      0
3:  3  3/1/2022      Blue     25

答案5

得分: 0

  • 使用id作为连接变量;
  • 分别在每个Criteria #上进行连接; 然后
  • 汇总所有可能的Amount

我得不到正确的结果,但我找不到任何组合可以产生您期望的输出...所以也许:

library(dplyr)
df1 %>%
  left_join(df2, by = c("id", "Criteria 1" = "Date")) %>%
  left_join(df2, by = c("id", "Criteria 2" = "Color")) %>%
  group_by(id, `Criteria 1`, `Criteria 2`) %>%
  summarize(Amount = sum(c(Amount.x, Amount.y), na.rm = TRUE)) %>%
  ungroup()
# # A tibble: 3 × 4
#      id `Criteria 1` `Criteria 2` Amount
#   <int> <chr>        <chr>         <int>
# 1     1 3/1/2022     Black            30
# 2     2 5/2/2022     Black            10
# 3     3 3/1/2022     Blue             50

如果您需要同时连接两者,那么这与GuedesBF的回答唯一的不同之处是包括id

df1 %>%
  left_join(df2, by = c("id", "Criteria 1" = "Date", "Criteria 2" = "Color")) %>%
  group_by(id, `Criteria 1`, `Criteria 2`) %>%
  summarize(Amount = sum(Amount, na.rm = TRUE)) %>%
  ungroup()
# # A tibble: 3 × 4
#      id `Criteria 1` `Criteria 2` Amount
#   <int> <chr>        <chr>         <int>
# 1     1 3/1/2022     Black            15
# 2     2 5/2/2022     Black             0
# 3     3 3/1/2022     Blue             25

数据

df1 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue")), class = "data.frame", row names = c(NA, -3L))
df2 <- structure(list(id = 1:6, Date = c("3/1/2022", "5/2/2022", "3/1/2022", "5/2/2022", "4/1/2022", "3/1/2022"), Color = c("Black", "Red", "Blue", "Red", "Black", "Black"), Amount = c(15L, 10L, 25L, 10L, 15L, 45L)), class = "data.frame", row names = c(NA, -6L))
df3 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue"), Amount = c(60L, 0L, 25L)), class = "data.frame", row names = c(NA, -3L))
英文:

Similar assumption as GuedesBF, but:

  • using id as a join var;
  • joining individually on each of Criteria #; then
  • summing all possible Amounts.

I don't get the right results, but there's no combination I found that produces your expected output ... so perhaps:

library(dplyr)
df1 %&gt;%
  left_join(df2, by = c(&quot;id&quot;, &quot;Criteria 1&quot; = &quot;Date&quot;)) %&gt;%
  left_join(df2, by = c(&quot;id&quot;, &quot;Criteria 2&quot; = &quot;Color&quot;)) %&gt;%
  group_by(id, `Criteria 1`, `Criteria 2`) %&gt;%
  summarize(Amount = sum(c(Amount.x, Amount.y), na.rm = TRUE)) %&gt;%
  ungroup()
# # A tibble: 3 &#215; 4
#      id `Criteria 1` `Criteria 2` Amount
#   &lt;int&gt; &lt;chr&gt;        &lt;chr&gt;         &lt;int&gt;
# 1     1 3/1/2022     Black            30
# 2     2 5/2/2022     Black            10
# 3     3 3/1/2022     Blue             50

If you need to join on both simultaneously, then this only differs from GuedesBF's answer by the inclusion of id:

df1 %&gt;%
  left_join(df2, by = c(&quot;id&quot;, &quot;Criteria 1&quot; = &quot;Date&quot;, &quot;Criteria 2&quot; = &quot;Color&quot;)) %&gt;%
  group_by(id, `Criteria 1`, `Criteria 2`) %&gt;%
  summarize(Amount = sum(Amount, na.rm = TRUE)) %&gt;%
  ungroup()
# # A tibble: 3 &#215; 4
#      id `Criteria 1` `Criteria 2` Amount
#   &lt;int&gt; &lt;chr&gt;        &lt;chr&gt;         &lt;int&gt;
# 1     1 3/1/2022     Black            15
# 2     2 5/2/2022     Black             0
# 3     3 3/1/2022     Blue             25

Data

df1 &lt;- structure(list(id = 1:3, &quot;Criteria 1&quot; = c(&quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;3/1/2022&quot;), &quot;Criteria 2&quot; = c(&quot;Black&quot;, &quot;Black&quot;, &quot;Blue&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))
df2 &lt;- structure(list(id = 1:6, Date = c(&quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;4/1/2022&quot;, &quot;3/1/2022&quot;), Color = c(&quot;Black&quot;, &quot;Red&quot;, &quot;Blue&quot;, &quot;Red&quot;, &quot;Black&quot;, &quot;Black&quot;), Amount = c(15L, 10L, 25L, 10L, 15L, 45L)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))
df3 &lt;- structure(list(id = 1:3, &quot;Criteria 1&quot; = c(&quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;3/1/2022&quot;), &quot;Criteria 2&quot; = c(&quot;Black&quot;, &quot;Black&quot;, &quot;Blue&quot;), Amount = c(60L, 0L, 25L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

答案6

得分: 0

我们可以使用 {powerjoin}:

library(powerjoin)
power_left_join(
  df1,
  df2 |&gt; 根据键汇总(Amout = sum(Amount)),
  by = c(&quot;条件 1&quot; = &quot;日期&quot;, &quot;条件 2&quot; = &quot;颜色&quot;),
  填充 = 0
)
#&gt;   id 条件 1 条件 2 Amout
#&gt; 1  1  3/1/2022  Black    60
#&gt; 2  2  5/2/2022  Black     0
#&gt; 3  3  3/1/2022   Blue    25

创建于2023-03-17,使用 reprex v2.0.2

英文:

We can use {powerjoin}:

library(powerjoin)
power_left_join(
  df1,
  df2 |&gt; summarize_by_keys(Amout = sum(Amount)),
  by = c(&quot;Criteria 1&quot; = &quot;Date&quot;, &quot;Criteria 2&quot; = &quot;Color&quot;),
  fill = 0
)
#&gt;   id Criteria 1 Criteria 2 Amout
#&gt; 1  1   3/1/2022      Black    60
#&gt; 2  2   5/2/2022      Black     0
#&gt; 3  3   3/1/2022       Blue    25

<sup>Created on 2023-03-17 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年2月24日 02:17:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548795.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定