2023年2月24日 02:17:29go评论110阅读模式

英文:

How to sum on multiple criteria using multiple data frames

问题

我有这个数据框，从中提取了条件：

id	条件 1	条件 2
1	3/1/2022	黑色
2	5/2/2022	黑色
3	3/1/2022	蓝色

我想要使用这些条件来从以下数据框中汇总金额：

id	日期	颜色	金额
1	3/1/2022	黑色	15
2	5/2/2022	红色	10
3	3/1/2022	蓝色	25
4	5/2/2022	红色	10
5	4/1/2022	黑色	15
6	3/1/2022	黑色	45

我希望它看起来像这样：

id	条件 1	条件 2	金额
1	3/1/2022	黑色	60
2	5/2/2022	黑色	0
3	3/1/2022	蓝色	25

英文:

I have this df where I pull the criteria from:

id	Criteria 1	Criteria 2
1	3/1/2022	Black
2	5/2/2022	Black
3	3/1/2022	Blue

I want to use the criteria to sum amounts from the following df:

id	Date	Color	Amount
1	3/1/2022	Black	15
2	5/2/2022	Red	10
3	3/1/2022	Blue	25
4	5/2/2022	Red	10
5	4/1/2022	Black	15
6	3/1/2022	Black	45

I want it to look like this:

id	Criteria 1	Criteria 2	Amount
1	3/1/2022	Black	60
2	5/2/2022	Black	0
3	3/1/2022	Blue	25

答案1

得分: 2

我怀疑你可能需要执行join > group_by > summarise(sum)。请分享数据。

库(dplyr)
df1 %>%
    left_join(df2, by = c('Criteria1' = 'Date', 'Criteria2' = 'Color')) %>%
    group_by(Criteria1, Criteria2) %>%
    summarise(Amount = sum(Amount, na.rm = TRUE))
          
# 一个 tibble: 3 × 3
# 组:   Criteria1 [2]
  Criteria1 Criteria2 Amount
  <chr>     <chr>      <int>
1 3/1/2022  Black         60
2 3/1/2022  Blue          25
3 5/2/2022  Black          0

英文:

I suspect you may need to join > group_by > summarise(sum). Please share the data.

library(dplyr)
df1 %&gt;%
    left_join(df2, by = c(&#39;Criteria1&#39; = &#39;Date&#39;, &#39;Criteria2&#39; = &#39;Color&#39;)) %&gt;%
    group_by(Criteria1, Criteria2) %&gt;%
    summarise(Amount = sum(Amount, na.rm = TRUE))
          
# A tibble: 3 &#215; 3
# Groups:   Criteria1 [2]
  Criteria1 Criteria2 Amount
  &lt;chr&gt;     &lt;chr&gt;      &lt;int&gt;
1 3/1/2022  Black         60
2 3/1/2022  Blue          25
3 5/2/2022  Black          0
</details>
# 答案2
**得分**: 1
```markdown
其他答案都很好，但是这里提供了一个使用基本R和`dplyr`结合的替代方法，可能对一些人更容易理解：
```R
xx <- df2 %>%
  group_by(Date, Color) %>%
  summarize(Amount = sum(Amount))
xy <- merge(df1, xx,
      by.x = c("Criteria1", "Criteria2"), 
      by.y = c("Date", "Color"), 
      all.x = TRUE)
xy[is.na(xy)] <- 0
xy <- xy[order(xy$id), c(3,1:2,4)]
#   id Criteria1 Criteria2 Amount
# 1  1  3/1/2022     Black     60
# 3  2  5/2/2022     Black      0
# 2  3  3/1/2022      Blue     25

数据

df1 <- read.table(text = "id Criteria1 Criteria2
1 3/1/2022 Black
2 5/2/2022 Black
3 3/1/2022 Blue", header = TRUE)
df2 <- read.table(text = "id Date Color Amount
1 3/1/2022 Black 15
2 5/2/2022 Red 10
3 3/1/2022 Blue 25
4 5/2/2022 Red 10
5 4/1/2022 Black 15
6 3/1/2022 Black 45", header = TRUE)

英文:

The other answers are excellent, but an alternative using a combination of base R and dplyr in case it is easier to understand for some folks:

xx &lt;- df2 %&gt;%  # summarize the data with `Amount`
  group_by(Date, Color) %&gt;%
  summarize(Amount = sum(Amount))
xy &lt;- merge(df1, xx, # merge with df1
      by.x = c(&quot;Criteria1&quot;, &quot;Criteria2&quot;), 
      by.y = c(&quot;Date&quot;, &quot;Color&quot;), 
      all.x = TRUE)
xy[is.na(xy)] &lt;- 0 # replace NAs with 0
xy &lt;- xy[order(xy$id), c(3,1:2,4)] # sort and reorder (may be superfluous)
#   id Criteria1 Criteria2 Amount
# 1  1  3/1/2022     Black     60
# 3  2  5/2/2022     Black      0
# 2  3  3/1/2022      Blue     25

Data

df1 &lt;- read.table(text = &quot;id	Criteria1	Criteria2
1	3/1/2022	Black
2	5/2/2022	Black
3	3/1/2022	Blue&quot;, header = TRUE)
df2 &lt;- read.table(text = &quot;id	Date	Color	Amount
1	3/1/2022	Black	15
2	5/2/2022	Red	10
3	3/1/2022	Blue	25
4	5/2/2022	Red	10
5	4/1/2022	Black	15
6	3/1/2022	Black	45&quot;, header = TRUE)

答案3

得分: 1

以下是代码的翻译部分：

merge(df1[-1], df2[-1], 
      by.x = c("Criteria1", "Criteria2"),
      by.y = c("Date", "Color"),
      all.x = TRUE) |>
  aggregate(Amount ~ Criteria1 + Criteria2, data = _, FUN = sum, na.action = na.pass, na.rm = TRUE)
#>   Criteria1 Criteria2 Amount
#> 1  3/1/2022     Black     60
#> 2  5/2/2022     Black      0
#> 3  3/1/2022      Blue     25

Data

df1 <- "id  Criteria1   Criteria2
1   3/1/2022    Black
2   5/2/2022    Black
3   3/1/2022    Blue";
df1 <- read.table(text = df1, header = TRUE)
df2 <- "id  Date    Color   Amount
1   3/1/2022    Black   15
2   5/2/2022    Red     10
3   3/1/2022    Blue    25
4   5/2/2022    Red     10
5   4/1/2022    Black   15
6   3/1/2022    Black   45";
df2 <- read.table(text = df2, header = TRUE)

希望这对您有所帮助。如果您需要进一步的帮助，请随时告诉我。

英文:

Here is a base R solution with merge and aggregate.

merge(df1[-1], df2[-1], 
      by.x = c(&quot;Criteria1&quot;, &quot;Criteria2&quot;),
      by.y = c(&quot;Date&quot;, &quot;Color&quot;),
      all.x = TRUE) |&gt;
  aggregate(Amount ~ Criteria1 + Criteria2, data = _, FUN = sum, na.action = na.pass, na.rm = TRUE)
#&gt;   Criteria1 Criteria2 Amount
#&gt; 1  3/1/2022     Black     60
#&gt; 2  5/2/2022     Black      0
#&gt; 3  3/1/2022      Blue     25

Created on 2023-02-23 with reprex v2.0.2

Data

df1 &lt;- &quot;id  Criteria1   Criteria2
1   3/1/2022    Black
2   5/2/2022    Black
3   3/1/2022    Blue&quot;
df1 &lt;- read.table(text = df1, header = TRUE)
df2 &lt;- &quot;id  Date    Color   Amount
1   3/1/2022    Black   15
2   5/2/2022    Red     10
3   3/1/2022    Blue    25
4   5/2/2022    Red     10
5   4/1/2022    Black   15
6   3/1/2022    Black   45&quot;
df2 &lt;- read.table(text = df2, header = TRUE)

Created on 2023-02-23 with reprex v2.0.2

答案4

得分: 1

使用 data.table

library(data.table)
setDT(df1)[, Amount := df2[.SD, sum(Amount), 
  on = .(Date = Criteria1, Color = Criteria2), by = .EACHI]$V1]

-output

> df1[is.na(Amount), Amount  := 0]
> df1
   id Criteria1 Criteria2 Amount
1:  1  3/1/2022     Black     60
2:  2  5/2/2022     Black      0
3:  3  3/1/2022      Blue     25

请注意，这些是代码示例，不需要翻译。

英文:

Using data.table

library(data.table)
setDT(df1)[, Amount := df2[.SD, sum(Amount), 
  on = .(Date = Criteria1, Color = Criteria2), by = .EACHI]$V1]

-output

&gt; df1[is.na(Amount), Amount  := 0]
&gt; df1
   id Criteria1 Criteria2 Amount
1:  1  3/1/2022     Black     60
2:  2  5/2/2022     Black      0
3:  3  3/1/2022      Blue     25

答案5

得分: 0

使用id作为连接变量;
分别在每个Criteria #上进行连接; 然后
汇总所有可能的Amount。

我得不到正确的结果，但我找不到任何组合可以产生您期望的输出...所以也许：

library(dplyr)
df1 %>%
  left_join(df2, by = c("id", "Criteria 1" = "Date")) %>%
  left_join(df2, by = c("id", "Criteria 2" = "Color")) %>%
  group_by(id, `Criteria 1`, `Criteria 2`) %>%
  summarize(Amount = sum(c(Amount.x, Amount.y), na.rm = TRUE)) %>%
  ungroup()
# # A tibble: 3 × 4
#      id `Criteria 1` `Criteria 2` Amount
#   <int> <chr>        <chr>         <int>
# 1     1 3/1/2022     Black            30
# 2     2 5/2/2022     Black            10
# 3     3 3/1/2022     Blue             50

如果您需要同时连接两者，那么这与GuedesBF的回答唯一的不同之处是包括id：

df1 %>%
  left_join(df2, by = c("id", "Criteria 1" = "Date", "Criteria 2" = "Color")) %>%
  group_by(id, `Criteria 1`, `Criteria 2`) %>%
  summarize(Amount = sum(Amount, na.rm = TRUE)) %>%
  ungroup()
# # A tibble: 3 × 4
#      id `Criteria 1` `Criteria 2` Amount
#   <int> <chr>        <chr>         <int>
# 1     1 3/1/2022     Black            15
# 2     2 5/2/2022     Black             0
# 3     3 3/1/2022     Blue             25

数据

df1 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue")), class = "data.frame", row names = c(NA, -3L))
df2 <- structure(list(id = 1:6, Date = c("3/1/2022", "5/2/2022", "3/1/2022", "5/2/2022", "4/1/2022", "3/1/2022"), Color = c("Black", "Red", "Blue", "Red", "Black", "Black"), Amount = c(15L, 10L, 25L, 10L, 15L, 45L)), class = "data.frame", row names = c(NA, -6L))
df3 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue"), Amount = c(60L, 0L, 25L)), class = "data.frame", row names = c(NA, -3L))

英文:

Similar assumption as GuedesBF, but:

using id as a join var;
joining individually on each of Criteria #; then
summing all possible Amounts.

I don't get the right results, but there's no combination I found that produces your expected output ... so perhaps:

library(dplyr)
df1 %&gt;%
  left_join(df2, by = c(&quot;id&quot;, &quot;Criteria 1&quot; = &quot;Date&quot;)) %&gt;%
  left_join(df2, by = c(&quot;id&quot;, &quot;Criteria 2&quot; = &quot;Color&quot;)) %&gt;%
  group_by(id, `Criteria 1`, `Criteria 2`) %&gt;%
  summarize(Amount = sum(c(Amount.x, Amount.y), na.rm = TRUE)) %&gt;%
  ungroup()
# # A tibble: 3 &#215; 4
#      id `Criteria 1` `Criteria 2` Amount
#   &lt;int&gt; &lt;chr&gt;        &lt;chr&gt;         &lt;int&gt;
# 1     1 3/1/2022     Black            30
# 2     2 5/2/2022     Black            10
# 3     3 3/1/2022     Blue             50

If you need to join on both simultaneously, then this only differs from GuedesBF's answer by the inclusion of id:

df1 %&gt;%
  left_join(df2, by = c(&quot;id&quot;, &quot;Criteria 1&quot; = &quot;Date&quot;, &quot;Criteria 2&quot; = &quot;Color&quot;)) %&gt;%
  group_by(id, `Criteria 1`, `Criteria 2`) %&gt;%
  summarize(Amount = sum(Amount, na.rm = TRUE)) %&gt;%
  ungroup()
# # A tibble: 3 &#215; 4
#      id `Criteria 1` `Criteria 2` Amount
#   &lt;int&gt; &lt;chr&gt;        &lt;chr&gt;         &lt;int&gt;
# 1     1 3/1/2022     Black            15
# 2     2 5/2/2022     Black             0
# 3     3 3/1/2022     Blue             25

Data

df1 &lt;- structure(list(id = 1:3, &quot;Criteria 1&quot; = c(&quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;3/1/2022&quot;), &quot;Criteria 2&quot; = c(&quot;Black&quot;, &quot;Black&quot;, &quot;Blue&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))
df2 &lt;- structure(list(id = 1:6, Date = c(&quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;4/1/2022&quot;, &quot;3/1/2022&quot;), Color = c(&quot;Black&quot;, &quot;Red&quot;, &quot;Blue&quot;, &quot;Red&quot;, &quot;Black&quot;, &quot;Black&quot;), Amount = c(15L, 10L, 25L, 10L, 15L, 45L)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))
df3 &lt;- structure(list(id = 1:3, &quot;Criteria 1&quot; = c(&quot;3/1/2022&quot;, &quot;5/2/2022&quot;, &quot;3/1/2022&quot;), &quot;Criteria 2&quot; = c(&quot;Black&quot;, &quot;Black&quot;, &quot;Blue&quot;), Amount = c(60L, 0L, 25L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

答案6

得分: 0

我们可以使用 {powerjoin}：

library(powerjoin)
power_left_join(
  df1,
  df2 |&gt; 根据键汇总(Amout = sum(Amount)),
  by = c(&quot;条件 1&quot; = &quot;日期&quot;, &quot;条件 2&quot; = &quot;颜色&quot;),
  填充 = 0
)
#&gt;   id 条件 1 条件 2 Amout
#&gt; 1  1  3/1/2022  Black    60
#&gt; 2  2  5/2/2022  Black     0
#&gt; 3  3  3/1/2022   Blue    25

^{创建于2023-03-17，使用 reprex v2.0.2}

英文:

We can use {powerjoin}:

library(powerjoin)
power_left_join(
  df1,
  df2 |&gt; summarize_by_keys(Amout = sum(Amount)),
  by = c(&quot;Criteria 1&quot; = &quot;Date&quot;, &quot;Criteria 2&quot; = &quot;Color&quot;),
  fill = 0
)
#&gt;   id Criteria 1 Criteria 2 Amout
#&gt; 1  1   3/1/2022      Black    60
#&gt; 2  2   5/2/2022      Black     0
#&gt; 3  3   3/1/2022       Blue    25

Created on 2023-03-17 with reprex v2.0.2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在多个数据框上使用多个条件进行求和

问题

答案1

答案3

Data

Data

答案4

答案5

答案6

提取半结构化 .txt 中的文本和表格。

如何从R中加载/修改图像标签？

makeForkCluster 在 Unix 上创建一个套接字集群。

有没有一个R函数可以找到一行中最高的值，而不与列中的其他名称匹配？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论