如何在多个数据框上使用多个条件进行求和

huangapple go评论110阅读模式
英文:

How to sum on multiple criteria using multiple data frames

问题

我有这个数据框,从中提取了条件:

id 条件 1 条件 2
1 3/1/2022 黑色
2 5/2/2022 黑色
3 3/1/2022 蓝色

我想要使用这些条件来从以下数据框中汇总金额:

id 日期 颜色 金额
1 3/1/2022 黑色 15
2 5/2/2022 红色 10
3 3/1/2022 蓝色 25
4 5/2/2022 红色 10
5 4/1/2022 黑色 15
6 3/1/2022 黑色 45

我希望它看起来像这样:

id 条件 1 条件 2 金额
1 3/1/2022 黑色 60
2 5/2/2022 黑色 0
3 3/1/2022 蓝色 25
英文:

I have this df where I pull the criteria from:

id Criteria 1 Criteria 2
1 3/1/2022 Black
2 5/2/2022 Black
3 3/1/2022 Blue

I want to use the criteria to sum amounts from the following df:

id Date Color Amount
1 3/1/2022 Black 15
2 5/2/2022 Red 10
3 3/1/2022 Blue 25
4 5/2/2022 Red 10
5 4/1/2022 Black 15
6 3/1/2022 Black 45

I want it to look like this:

id Criteria 1 Criteria 2 Amount
1 3/1/2022 Black 60
2 5/2/2022 Black 0
3 3/1/2022 Blue 25

答案1

得分: 2

我怀疑你可能需要执行join > group_by > summarise(sum)。请分享数据。

  1. 库(dplyr)
  2. df1 %>%
  3. left_join(df2, by = c('Criteria1' = 'Date', 'Criteria2' = 'Color')) %>%
  4. group_by(Criteria1, Criteria2) %>%
  5. summarise(Amount = sum(Amount, na.rm = TRUE))
  6. # 一个 tibble: 3 × 3
  7. # 组: Criteria1 [2]
  8. Criteria1 Criteria2 Amount
  9. <chr> <chr> <int>
  10. 1 3/1/2022 Black 60
  11. 2 3/1/2022 Blue 25
  12. 3 5/2/2022 Black 0
英文:

I suspect you may need to join &gt; group_by &gt; summarise(sum). Please share the data.

  1. library(dplyr)
  2. df1 %&gt;%
  3. left_join(df2, by = c(&#39;Criteria1&#39; = &#39;Date&#39;, &#39;Criteria2&#39; = &#39;Color&#39;)) %&gt;%
  4. group_by(Criteria1, Criteria2) %&gt;%
  5. summarise(Amount = sum(Amount, na.rm = TRUE))
  6. # A tibble: 3 &#215; 3
  7. # Groups: Criteria1 [2]
  8. Criteria1 Criteria2 Amount
  9. &lt;chr&gt; &lt;chr&gt; &lt;int&gt;
  10. 1 3/1/2022 Black 60
  11. 2 3/1/2022 Blue 25
  12. 3 5/2/2022 Black 0
  13. </details>
  14. # 答案2
  15. **得分**: 1
  16. ```markdown
  17. 其他答案都很好,但是这里提供了一个使用基本R和`dplyr`结合的替代方法,可能对一些人更容易理解:
  18. ```R
  19. xx <- df2 %>%
  20. group_by(Date, Color) %>%
  21. summarize(Amount = sum(Amount))
  22. xy <- merge(df1, xx,
  23. by.x = c("Criteria1", "Criteria2"),
  24. by.y = c("Date", "Color"),
  25. all.x = TRUE)
  26. xy[is.na(xy)] <- 0
  27. xy <- xy[order(xy$id), c(3,1:2,4)]
  28. # id Criteria1 Criteria2 Amount
  29. # 1 1 3/1/2022 Black 60
  30. # 3 2 5/2/2022 Black 0
  31. # 2 3 3/1/2022 Blue 25

数据

  1. df1 <- read.table(text = "id Criteria1 Criteria2
  2. 1 3/1/2022 Black
  3. 2 5/2/2022 Black
  4. 3 3/1/2022 Blue", header = TRUE)
  5. df2 <- read.table(text = "id Date Color Amount
  6. 1 3/1/2022 Black 15
  7. 2 5/2/2022 Red 10
  8. 3 3/1/2022 Blue 25
  9. 4 5/2/2022 Red 10
  10. 5 4/1/2022 Black 15
  11. 6 3/1/2022 Black 45", header = TRUE)
英文:

The other answers are excellent, but an alternative using a combination of base R and dplyr in case it is easier to understand for some folks:

  1. xx &lt;- df2 %&gt;% # summarize the data with `Amount`
  2. group_by(Date, Color) %&gt;%
  3. summarize(Amount = sum(Amount))
  4. xy &lt;- merge(df1, xx, # merge with df1
  5. by.x = c(&quot;Criteria1&quot;, &quot;Criteria2&quot;),
  6. by.y = c(&quot;Date&quot;, &quot;Color&quot;),
  7. all.x = TRUE)
  8. xy[is.na(xy)] &lt;- 0 # replace NAs with 0
  9. xy &lt;- xy[order(xy$id), c(3,1:2,4)] # sort and reorder (may be superfluous)
  10. # id Criteria1 Criteria2 Amount
  11. # 1 1 3/1/2022 Black 60
  12. # 3 2 5/2/2022 Black 0
  13. # 2 3 3/1/2022 Blue 25

Data

  1. df1 &lt;- read.table(text = &quot;id Criteria1 Criteria2
  2. 1 3/1/2022 Black
  3. 2 5/2/2022 Black
  4. 3 3/1/2022 Blue&quot;, header = TRUE)
  5. df2 &lt;- read.table(text = &quot;id Date Color Amount
  6. 1 3/1/2022 Black 15
  7. 2 5/2/2022 Red 10
  8. 3 3/1/2022 Blue 25
  9. 4 5/2/2022 Red 10
  10. 5 4/1/2022 Black 15
  11. 6 3/1/2022 Black 45&quot;, header = TRUE)

答案3

得分: 1

以下是代码的翻译部分:

  1. merge(df1[-1], df2[-1],
  2. by.x = c("Criteria1", "Criteria2"),
  3. by.y = c("Date", "Color"),
  4. all.x = TRUE) |>
  5. aggregate(Amount ~ Criteria1 + Criteria2, data = _, FUN = sum, na.action = na.pass, na.rm = TRUE)
  6. #> Criteria1 Criteria2 Amount
  7. #> 1 3/1/2022 Black 60
  8. #> 2 5/2/2022 Black 0
  9. #> 3 3/1/2022 Blue 25

Data

  1. df1 <- "id Criteria1 Criteria2
  2. 1 3/1/2022 Black
  3. 2 5/2/2022 Black
  4. 3 3/1/2022 Blue";
  5. df1 <- read.table(text = df1, header = TRUE)
  6. df2 <- "id Date Color Amount
  7. 1 3/1/2022 Black 15
  8. 2 5/2/2022 Red 10
  9. 3 3/1/2022 Blue 25
  10. 4 5/2/2022 Red 10
  11. 5 4/1/2022 Black 15
  12. 6 3/1/2022 Black 45";
  13. df2 <- read.table(text = df2, header = TRUE)

希望这对您有所帮助。如果您需要进一步的帮助,请随时告诉我。

英文:

Here is a base R solution with merge and aggregate.

  1. merge(df1[-1], df2[-1],
  2. by.x = c(&quot;Criteria1&quot;, &quot;Criteria2&quot;),
  3. by.y = c(&quot;Date&quot;, &quot;Color&quot;),
  4. all.x = TRUE) |&gt;
  5. aggregate(Amount ~ Criteria1 + Criteria2, data = _, FUN = sum, na.action = na.pass, na.rm = TRUE)
  6. #&gt; Criteria1 Criteria2 Amount
  7. #&gt; 1 3/1/2022 Black 60
  8. #&gt; 2 5/2/2022 Black 0
  9. #&gt; 3 3/1/2022 Blue 25

<sup>Created on 2023-02-23 with reprex v2.0.2</sup>


Data

  1. df1 &lt;- &quot;id Criteria1 Criteria2
  2. 1 3/1/2022 Black
  3. 2 5/2/2022 Black
  4. 3 3/1/2022 Blue&quot;
  5. df1 &lt;- read.table(text = df1, header = TRUE)
  6. df2 &lt;- &quot;id Date Color Amount
  7. 1 3/1/2022 Black 15
  8. 2 5/2/2022 Red 10
  9. 3 3/1/2022 Blue 25
  10. 4 5/2/2022 Red 10
  11. 5 4/1/2022 Black 15
  12. 6 3/1/2022 Black 45&quot;
  13. df2 &lt;- read.table(text = df2, header = TRUE)

<sup>Created on 2023-02-23 with reprex v2.0.2</sup>

答案4

得分: 1

使用 data.table

  1. library(data.table)
  2. setDT(df1)[, Amount := df2[.SD, sum(Amount),
  3. on = .(Date = Criteria1, Color = Criteria2), by = .EACHI]$V1]

-output

  1. > df1[is.na(Amount), Amount := 0]
  2. > df1
  3. id Criteria1 Criteria2 Amount
  4. 1: 1 3/1/2022 Black 60
  5. 2: 2 5/2/2022 Black 0
  6. 3: 3 3/1/2022 Blue 25

请注意,这些是代码示例,不需要翻译。

英文:

Using data.table

  1. library(data.table)
  2. setDT(df1)[, Amount := df2[.SD, sum(Amount),
  3. on = .(Date = Criteria1, Color = Criteria2), by = .EACHI]$V1]

-output

  1. &gt; df1[is.na(Amount), Amount := 0]
  2. &gt; df1
  3. id Criteria1 Criteria2 Amount
  4. 1: 1 3/1/2022 Black 60
  5. 2: 2 5/2/2022 Black 0
  6. 3: 3 3/1/2022 Blue 25

答案5

得分: 0

  • 使用id作为连接变量;
  • 分别在每个Criteria #上进行连接; 然后
  • 汇总所有可能的Amount

我得不到正确的结果,但我找不到任何组合可以产生您期望的输出...所以也许:

  1. library(dplyr)
  2. df1 %>%
  3. left_join(df2, by = c("id", "Criteria 1" = "Date")) %>%
  4. left_join(df2, by = c("id", "Criteria 2" = "Color")) %>%
  5. group_by(id, `Criteria 1`, `Criteria 2`) %>%
  6. summarize(Amount = sum(c(Amount.x, Amount.y), na.rm = TRUE)) %>%
  7. ungroup()
  8. # # A tibble: 3 × 4
  9. # id `Criteria 1` `Criteria 2` Amount
  10. # <int> <chr> <chr> <int>
  11. # 1 1 3/1/2022 Black 30
  12. # 2 2 5/2/2022 Black 10
  13. # 3 3 3/1/2022 Blue 50

如果您需要同时连接两者,那么这与GuedesBF的回答唯一的不同之处是包括id

  1. df1 %>%
  2. left_join(df2, by = c("id", "Criteria 1" = "Date", "Criteria 2" = "Color")) %>%
  3. group_by(id, `Criteria 1`, `Criteria 2`) %>%
  4. summarize(Amount = sum(Amount, na.rm = TRUE)) %>%
  5. ungroup()
  6. # # A tibble: 3 × 4
  7. # id `Criteria 1` `Criteria 2` Amount
  8. # <int> <chr> <chr> <int>
  9. # 1 1 3/1/2022 Black 15
  10. # 2 2 5/2/2022 Black 0
  11. # 3 3 3/1/2022 Blue 25

数据

  1. df1 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue")), class = "data.frame", row names = c(NA, -3L))
  2. df2 <- structure(list(id = 1:6, Date = c("3/1/2022", "5/2/2022", "3/1/2022", "5/2/2022", "4/1/2022", "3/1/2022"), Color = c("Black", "Red", "Blue", "Red", "Black", "Black"), Amount = c(15L, 10L, 25L, 10L, 15L, 45L)), class = "data.frame", row names = c(NA, -6L))
  3. df3 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue"), Amount = c(60L, 0L, 25L)), class = "data.frame", row names = c(NA, -3L))
英文:

Similar assumption as GuedesBF, but:

  • using id as a join var;
  • joining individually on each of Criteria #; then
  • summing all possible Amounts.

I don't get the right results, but there's no combination I found that produces your expected output ... so perhaps:

  1. library(dplyr)
  2. df1 %&gt;%
  3. left_join(df2, by = c(&quot;id&quot;, &quot;Criteria 1&quot; = &quot;Date&quot;)) %&gt;%
  4. left_join(df2, by = c(&quot;id&quot;, &quot;Criteria 2&quot; = &quot;Color&quot;)) %&gt;%
  5. group_by(id, `Criteria 1`, `Criteria 2`) %&gt;%
  6. summarize(Amount = sum(c(Amount.x, Amount.y), na.rm = TRUE)) %&gt;%
  7. ungroup()
  8. # # A tibble: 3 &#215; 4
  9. # id `Criteria 1` `Criteria 2` Amount
  10. # &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;int&gt;
  11. # 1 1 3/1/2022 Black 30
  12. # 2 2 5/2/2022 Black 10
  13. # 3 3 3/1/2022 Blue 50

If you need to join on both simultaneously, then this only differs from GuedesBF's answer by the inclusion of id:

  1. df1 %&gt;%
  2. left_join(df2, by = c(&quot;id&quot;, &quot;Criteria 1&quot; = &quot;Date&quot;, &quot;Criteria 2&quot; = &quot;Color&quot;)) %&gt;%
  3. group_by(id, `Criteria 1`, `Criteria 2`) %&gt;%
  4. summarize(Amount = sum(Amount, na.rm = TRUE)) %&gt;%
  5. ungroup()
  6. # # A tibble: 3 &#215; 4
  7. # id `Criteria 1` `Criteria 2` Amount
  8. # &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;int&gt;
  9. # 1 1 3/1/2022 Black 15
  10. # 2 2 5/2/2022 Black 0
  11. # 3 3 3/1/2022 Blue 25

Data

  1. df1 &lt;- structure(list(id = 1:3, &quot;Criteria 1&quot; = c(&quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;3/1/2022&quot;), &quot;Criteria 2&quot; = c(&quot;Black&quot;, &quot;Black&quot;, &quot;Blue&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))
  2. df2 &lt;- structure(list(id = 1:6, Date = c(&quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;4/1/2022&quot;, &quot;3/1/2022&quot;), Color = c(&quot;Black&quot;, &quot;Red&quot;, &quot;Blue&quot;, &quot;Red&quot;, &quot;Black&quot;, &quot;Black&quot;), Amount = c(15L, 10L, 25L, 10L, 15L, 45L)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))
  3. df3 &lt;- structure(list(id = 1:3, &quot;Criteria 1&quot; = c(&quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;3/1/2022&quot;), &quot;Criteria 2&quot; = c(&quot;Black&quot;, &quot;Black&quot;, &quot;Blue&quot;), Amount = c(60L, 0L, 25L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

答案6

得分: 0

我们可以使用 {powerjoin}:

  1. library(powerjoin)
  2. power_left_join(
  3. df1,
  4. df2 |&gt; 根据键汇总(Amout = sum(Amount)),
  5. by = c(&quot;条件 1&quot; = &quot;日期&quot;, &quot;条件 2&quot; = &quot;颜色&quot;),
  6. 填充 = 0
  7. )
  8. #&gt; id 条件 1 条件 2 Amout
  9. #&gt; 1 1 3/1/2022 Black 60
  10. #&gt; 2 2 5/2/2022 Black 0
  11. #&gt; 3 3 3/1/2022 Blue 25

创建于2023-03-17,使用 reprex v2.0.2

英文:

We can use {powerjoin}:

  1. library(powerjoin)
  2. power_left_join(
  3. df1,
  4. df2 |&gt; summarize_by_keys(Amout = sum(Amount)),
  5. by = c(&quot;Criteria 1&quot; = &quot;Date&quot;, &quot;Criteria 2&quot; = &quot;Color&quot;),
  6. fill = 0
  7. )
  8. #&gt; id Criteria 1 Criteria 2 Amout
  9. #&gt; 1 1 3/1/2022 Black 60
  10. #&gt; 2 2 5/2/2022 Black 0
  11. #&gt; 3 3 3/1/2022 Blue 25

<sup>Created on 2023-03-17 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年2月24日 02:17:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548795.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定