英文:
How to sum on multiple criteria using multiple data frames
问题
我有这个数据框,从中提取了条件:
id | 条件 1 | 条件 2 |
---|---|---|
1 | 3/1/2022 | 黑色 |
2 | 5/2/2022 | 黑色 |
3 | 3/1/2022 | 蓝色 |
我想要使用这些条件来从以下数据框中汇总金额:
id | 日期 | 颜色 | 金额 |
---|---|---|---|
1 | 3/1/2022 | 黑色 | 15 |
2 | 5/2/2022 | 红色 | 10 |
3 | 3/1/2022 | 蓝色 | 25 |
4 | 5/2/2022 | 红色 | 10 |
5 | 4/1/2022 | 黑色 | 15 |
6 | 3/1/2022 | 黑色 | 45 |
我希望它看起来像这样:
id | 条件 1 | 条件 2 | 金额 |
---|---|---|---|
1 | 3/1/2022 | 黑色 | 60 |
2 | 5/2/2022 | 黑色 | 0 |
3 | 3/1/2022 | 蓝色 | 25 |
英文:
I have this df where I pull the criteria from:
id | Criteria 1 | Criteria 2 |
---|---|---|
1 | 3/1/2022 | Black |
2 | 5/2/2022 | Black |
3 | 3/1/2022 | Blue |
I want to use the criteria to sum amounts from the following df:
id | Date | Color | Amount |
---|---|---|---|
1 | 3/1/2022 | Black | 15 |
2 | 5/2/2022 | Red | 10 |
3 | 3/1/2022 | Blue | 25 |
4 | 5/2/2022 | Red | 10 |
5 | 4/1/2022 | Black | 15 |
6 | 3/1/2022 | Black | 45 |
I want it to look like this:
id | Criteria 1 | Criteria 2 | Amount |
---|---|---|---|
1 | 3/1/2022 | Black | 60 |
2 | 5/2/2022 | Black | 0 |
3 | 3/1/2022 | Blue | 25 |
答案1
得分: 2
我怀疑你可能需要执行join > group_by > summarise(sum)
。请分享数据。
库(dplyr)
df1 %>%
left_join(df2, by = c('Criteria1' = 'Date', 'Criteria2' = 'Color')) %>%
group_by(Criteria1, Criteria2) %>%
summarise(Amount = sum(Amount, na.rm = TRUE))
# 一个 tibble: 3 × 3
# 组: Criteria1 [2]
Criteria1 Criteria2 Amount
<chr> <chr> <int>
1 3/1/2022 Black 60
2 3/1/2022 Blue 25
3 5/2/2022 Black 0
英文:
I suspect you may need to join > group_by > summarise(sum)
. Please share the data.
library(dplyr)
df1 %>%
left_join(df2, by = c('Criteria1' = 'Date', 'Criteria2' = 'Color')) %>%
group_by(Criteria1, Criteria2) %>%
summarise(Amount = sum(Amount, na.rm = TRUE))
# A tibble: 3 × 3
# Groups: Criteria1 [2]
Criteria1 Criteria2 Amount
<chr> <chr> <int>
1 3/1/2022 Black 60
2 3/1/2022 Blue 25
3 5/2/2022 Black 0
</details>
# 答案2
**得分**: 1
```markdown
其他答案都很好,但是这里提供了一个使用基本R和`dplyr`结合的替代方法,可能对一些人更容易理解:
```R
xx <- df2 %>%
group_by(Date, Color) %>%
summarize(Amount = sum(Amount))
xy <- merge(df1, xx,
by.x = c("Criteria1", "Criteria2"),
by.y = c("Date", "Color"),
all.x = TRUE)
xy[is.na(xy)] <- 0
xy <- xy[order(xy$id), c(3,1:2,4)]
# id Criteria1 Criteria2 Amount
# 1 1 3/1/2022 Black 60
# 3 2 5/2/2022 Black 0
# 2 3 3/1/2022 Blue 25
数据
df1 <- read.table(text = "id Criteria1 Criteria2
1 3/1/2022 Black
2 5/2/2022 Black
3 3/1/2022 Blue", header = TRUE)
df2 <- read.table(text = "id Date Color Amount
1 3/1/2022 Black 15
2 5/2/2022 Red 10
3 3/1/2022 Blue 25
4 5/2/2022 Red 10
5 4/1/2022 Black 15
6 3/1/2022 Black 45", header = TRUE)
英文:
The other answers are excellent, but an alternative using a combination of base R and dplyr
in case it is easier to understand for some folks:
xx <- df2 %>% # summarize the data with `Amount`
group_by(Date, Color) %>%
summarize(Amount = sum(Amount))
xy <- merge(df1, xx, # merge with df1
by.x = c("Criteria1", "Criteria2"),
by.y = c("Date", "Color"),
all.x = TRUE)
xy[is.na(xy)] <- 0 # replace NAs with 0
xy <- xy[order(xy$id), c(3,1:2,4)] # sort and reorder (may be superfluous)
# id Criteria1 Criteria2 Amount
# 1 1 3/1/2022 Black 60
# 3 2 5/2/2022 Black 0
# 2 3 3/1/2022 Blue 25
Data
df1 <- read.table(text = "id Criteria1 Criteria2
1 3/1/2022 Black
2 5/2/2022 Black
3 3/1/2022 Blue", header = TRUE)
df2 <- read.table(text = "id Date Color Amount
1 3/1/2022 Black 15
2 5/2/2022 Red 10
3 3/1/2022 Blue 25
4 5/2/2022 Red 10
5 4/1/2022 Black 15
6 3/1/2022 Black 45", header = TRUE)
答案3
得分: 1
以下是代码的翻译部分:
merge(df1[-1], df2[-1],
by.x = c("Criteria1", "Criteria2"),
by.y = c("Date", "Color"),
all.x = TRUE) |>
aggregate(Amount ~ Criteria1 + Criteria2, data = _, FUN = sum, na.action = na.pass, na.rm = TRUE)
#> Criteria1 Criteria2 Amount
#> 1 3/1/2022 Black 60
#> 2 5/2/2022 Black 0
#> 3 3/1/2022 Blue 25
Data
df1 <- "id Criteria1 Criteria2
1 3/1/2022 Black
2 5/2/2022 Black
3 3/1/2022 Blue";
df1 <- read.table(text = df1, header = TRUE)
df2 <- "id Date Color Amount
1 3/1/2022 Black 15
2 5/2/2022 Red 10
3 3/1/2022 Blue 25
4 5/2/2022 Red 10
5 4/1/2022 Black 15
6 3/1/2022 Black 45";
df2 <- read.table(text = df2, header = TRUE)
希望这对您有所帮助。如果您需要进一步的帮助,请随时告诉我。
英文:
Here is a base R solution with merge
and aggregate
.
merge(df1[-1], df2[-1],
by.x = c("Criteria1", "Criteria2"),
by.y = c("Date", "Color"),
all.x = TRUE) |>
aggregate(Amount ~ Criteria1 + Criteria2, data = _, FUN = sum, na.action = na.pass, na.rm = TRUE)
#> Criteria1 Criteria2 Amount
#> 1 3/1/2022 Black 60
#> 2 5/2/2022 Black 0
#> 3 3/1/2022 Blue 25
<sup>Created on 2023-02-23 with reprex v2.0.2</sup>
Data
df1 <- "id Criteria1 Criteria2
1 3/1/2022 Black
2 5/2/2022 Black
3 3/1/2022 Blue"
df1 <- read.table(text = df1, header = TRUE)
df2 <- "id Date Color Amount
1 3/1/2022 Black 15
2 5/2/2022 Red 10
3 3/1/2022 Blue 25
4 5/2/2022 Red 10
5 4/1/2022 Black 15
6 3/1/2022 Black 45"
df2 <- read.table(text = df2, header = TRUE)
<sup>Created on 2023-02-23 with reprex v2.0.2</sup>
答案4
得分: 1
使用 data.table
library(data.table)
setDT(df1)[, Amount := df2[.SD, sum(Amount),
on = .(Date = Criteria1, Color = Criteria2), by = .EACHI]$V1]
-output
> df1[is.na(Amount), Amount := 0]
> df1
id Criteria1 Criteria2 Amount
1: 1 3/1/2022 Black 60
2: 2 5/2/2022 Black 0
3: 3 3/1/2022 Blue 25
请注意,这些是代码示例,不需要翻译。
英文:
Using data.table
library(data.table)
setDT(df1)[, Amount := df2[.SD, sum(Amount),
on = .(Date = Criteria1, Color = Criteria2), by = .EACHI]$V1]
-output
> df1[is.na(Amount), Amount := 0]
> df1
id Criteria1 Criteria2 Amount
1: 1 3/1/2022 Black 60
2: 2 5/2/2022 Black 0
3: 3 3/1/2022 Blue 25
答案5
得分: 0
- 使用
id
作为连接变量; - 分别在每个
Criteria #
上进行连接; 然后 - 汇总所有可能的
Amount
。
我得不到正确的结果,但我找不到任何组合可以产生您期望的输出...所以也许:
library(dplyr)
df1 %>%
left_join(df2, by = c("id", "Criteria 1" = "Date")) %>%
left_join(df2, by = c("id", "Criteria 2" = "Color")) %>%
group_by(id, `Criteria 1`, `Criteria 2`) %>%
summarize(Amount = sum(c(Amount.x, Amount.y), na.rm = TRUE)) %>%
ungroup()
# # A tibble: 3 × 4
# id `Criteria 1` `Criteria 2` Amount
# <int> <chr> <chr> <int>
# 1 1 3/1/2022 Black 30
# 2 2 5/2/2022 Black 10
# 3 3 3/1/2022 Blue 50
如果您需要同时连接两者,那么这与GuedesBF的回答唯一的不同之处是包括id
:
df1 %>%
left_join(df2, by = c("id", "Criteria 1" = "Date", "Criteria 2" = "Color")) %>%
group_by(id, `Criteria 1`, `Criteria 2`) %>%
summarize(Amount = sum(Amount, na.rm = TRUE)) %>%
ungroup()
# # A tibble: 3 × 4
# id `Criteria 1` `Criteria 2` Amount
# <int> <chr> <chr> <int>
# 1 1 3/1/2022 Black 15
# 2 2 5/2/2022 Black 0
# 3 3 3/1/2022 Blue 25
数据
df1 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue")), class = "data.frame", row names = c(NA, -3L))
df2 <- structure(list(id = 1:6, Date = c("3/1/2022", "5/2/2022", "3/1/2022", "5/2/2022", "4/1/2022", "3/1/2022"), Color = c("Black", "Red", "Blue", "Red", "Black", "Black"), Amount = c(15L, 10L, 25L, 10L, 15L, 45L)), class = "data.frame", row names = c(NA, -6L))
df3 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue"), Amount = c(60L, 0L, 25L)), class = "data.frame", row names = c(NA, -3L))
英文:
Similar assumption as GuedesBF, but:
- using
id
as a join var; - joining individually on each of
Criteria #
; then - summing all possible
Amount
s.
I don't get the right results, but there's no combination I found that produces your expected output ... so perhaps:
library(dplyr)
df1 %>%
left_join(df2, by = c("id", "Criteria 1" = "Date")) %>%
left_join(df2, by = c("id", "Criteria 2" = "Color")) %>%
group_by(id, `Criteria 1`, `Criteria 2`) %>%
summarize(Amount = sum(c(Amount.x, Amount.y), na.rm = TRUE)) %>%
ungroup()
# # A tibble: 3 × 4
# id `Criteria 1` `Criteria 2` Amount
# <int> <chr> <chr> <int>
# 1 1 3/1/2022 Black 30
# 2 2 5/2/2022 Black 10
# 3 3 3/1/2022 Blue 50
If you need to join on both simultaneously, then this only differs from GuedesBF's answer by the inclusion of id
:
df1 %>%
left_join(df2, by = c("id", "Criteria 1" = "Date", "Criteria 2" = "Color")) %>%
group_by(id, `Criteria 1`, `Criteria 2`) %>%
summarize(Amount = sum(Amount, na.rm = TRUE)) %>%
ungroup()
# # A tibble: 3 × 4
# id `Criteria 1` `Criteria 2` Amount
# <int> <chr> <chr> <int>
# 1 1 3/1/2022 Black 15
# 2 2 5/2/2022 Black 0
# 3 3 3/1/2022 Blue 25
Data
df1 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue")), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(id = 1:6, Date = c("3/1/2022", "5/2/2022", "3/1/2022", "5/2/2022", "4/1/2022", "3/1/2022"), Color = c("Black", "Red", "Blue", "Red", "Black", "Black"), Amount = c(15L, 10L, 25L, 10L, 15L, 45L)), class = "data.frame", row.names = c(NA, -6L))
df3 <- structure(list(id = 1:3, "Criteria 1" = c("3/1/2022", "5/2/2022", "3/1/2022"), "Criteria 2" = c("Black", "Black", "Blue"), Amount = c(60L, 0L, 25L)), class = "data.frame", row.names = c(NA, -3L))
答案6
得分: 0
我们可以使用 {powerjoin}:
library(powerjoin)
power_left_join(
df1,
df2 |> 根据键汇总(Amout = sum(Amount)),
by = c("条件 1" = "日期", "条件 2" = "颜色"),
填充 = 0
)
#> id 条件 1 条件 2 Amout
#> 1 1 3/1/2022 Black 60
#> 2 2 5/2/2022 Black 0
#> 3 3 3/1/2022 Blue 25
创建于2023-03-17,使用 reprex v2.0.2
英文:
We can use {powerjoin}:
library(powerjoin)
power_left_join(
df1,
df2 |> summarize_by_keys(Amout = sum(Amount)),
by = c("Criteria 1" = "Date", "Criteria 2" = "Color"),
fill = 0
)
#> id Criteria 1 Criteria 2 Amout
#> 1 1 3/1/2022 Black 60
#> 2 2 5/2/2022 Black 0
#> 3 3 3/1/2022 Blue 25
<sup>Created on 2023-03-17 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论