英文:
Remove rows with symmetric values by a combination of columns
问题
我有一个数据框,想要根据其他列的组合来删除具有对称值的行。本质上,我想从我的销售数据框中删除退款。
我的初始数据框看起来像这样:
df <- data.frame(
clientID = c(101, 101, 102, 103, 103),
transactionID = c(1, 2, 3, 4, 5),
date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05")),
productID = c("P001", "P002", "P003", "P004", "P005"),
QTY = c(2, 3, 1, 5, 2)
)
refund_rows <- data.frame(
clientID = c(101, 102, 103, 101),
transactionID = c(6, 7, 8, 9),
date = as.Date(c("2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
productID = c("P001", "P003", "P005", "P006"),
QTY = c(-1, -1, -2, -5)
)
final_df <- bind_rows(df, refund_rows)
我希望最终的数据框看起来像这样:
clientID transactionID date productID QTY
101 2 2023-05-02 P002 3
103 4 2023-05-04 P004 5
101 9 2023-05-09 P006 -5
如何在R中实现这个目标?
我尝试了以下方法,但问题是我得到了transactionID = 9,它的QTY应该是负数:
final_df <- data.frame(
clientID = c(101, 101, 102, 103, 103, 101, 102, 103, 101),
transactionID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05", "2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
productID = c("P001", "P002", "P003", "P004", "P005", "P001", "P003", "P005", "P006"),
QTY = c(2, 3, 1, 5, 2, -1, -1, -2, -5)
)
refund_rows_new <- final_df[final_df$QTY < 0,]
refund_rows_abs <- refund_rows_new %>%
mutate(QTY = abs(QTY))
final_df_new <- final_df[final_df$QTY > 0,]
final_df_new %>% anti_join(refund_rows_abs, by = c("clientID", "productID", "QTY"))
这是删除了QTY值为正数的退款行,但没有解决transactionID = 9的问题。
英文:
I have a dataframe and I want to remove rows that have a symmetric value in a column based on a combination of other columns. In essence I want to remove the refunds from my sales dataframe.
My initial dataframe looks like this:
df <- data.frame(
clientID = c(101, 101, 102, 103, 103),
transactionID = c(1, 2, 3, 4, 5),
date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05")),
productID = c("P001", "P002", "P003", "P004", "P005"),
QTY = c(2, 3, 1, 5, 2)
)
refund_rows <- data.frame(
clientID = c(101, 102, 103, 101),
transactionID = c(6, 7, 8, 9),
date = as.Date(c("2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
productID = c("P001", "P003", "P005", "P006"),
QTY = c(-1, -1, -2, -5)
)
final_df <- bind_rows(df, refund_rows)
I want my final dataframe to look like this:
clientID transactionID date productID QTY
101 2 2023-05-02 P002 3
103 4 2023-05-04 P004 5
101 9 2023-05-09 P006 -5
How can I do this in R?
I tried the following but the problem is I am left with the transactionID = 9 which should be negative QTY
final_df <- data.frame(
clientID = c(101, 101, 102, 103, 103, 101, 102, 103, 101),
transactionID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05", "2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
productID = c("P001", "P002", "P003", "P004", "P005", "P001", "P003", "P005", "P006"),
QTY = c(2, 3, 1, 5, 2, -1, -1, -2, -5)
)
refund_rows_new <- final_df[final_df$QTY < 0,]
refund_rows_abs <- refund_rows_new %>%
mutate(QTY = abs(QTY))
final_df_new <- final_df[final_df$QTY > 0,]
final_df_new %>% anti_join(refund_rows_abs, by = c("clientID", "productID", "QTY"))
答案1
得分: 0
以下是翻译好的代码部分:
final_df %>%
group_by(clientID, productID) %>%
filter(sum(QTY) != 0)
final_df %>%
group_by(clientID, productID) %>%
summarise(QTY = sum(QTY)) %>%
filter(QTY != 0) %>%
left_join(final_df)
final_df %>%
group_by(clientID, productID) %>%
summarise(QTY = sum(QTY)) %>%
filter(QTY != 0) %>%
left_join(final_df) %>%
na.omit()
希望这对你有所帮助。如果你需要进一步的解释或有其他问题,请随时提出。
英文:
My first idea would be to group by clientID
and productID
and filter based on the sum of QTY
.
final_df %>%
group_by(clientID,productID) %>%
filter(sum(QTY)!=0)
clientID transactionID date productID QTY
<dbl> <dbl> <date> <chr> <dbl>
1 101 1 2023-05-01 P001 2
2 101 2 2023-05-02 P002 3
3 103 4 2023-05-04 P004 5
4 101 6 2023-05-07 P001 -1
5 101 9 2023-05-09 P006 -5
however this gives a different result from what you have requested. because client 1 bought 2 of product P001 and got a refund for 1.
So if you want to omit those entries you could do something in the lines of:
final_df %>%
group_by(clientID,productID) %>%
summarise(QTY=sum(QTY))%>%
filter(QTY!=0) %>%
left_join(final_df)
clientID productID QTY transactionID date
<dbl> <chr> <dbl> <dbl> <date>
1 101 P001 1 NA NA
2 101 P002 3 2 2023-05-02
3 101 P006 -5 9 2023-05-09
4 103 P004 5 4 2023-05-04
and omit rows containing NA
final_df %>%
group_by(clientID,productID) %>%
summarise(QTY=sum(QTY))%>%
filter(QTY!=0) %>%
left_join(final_df) %>%
na.omit()
Joining with `by = join_by(clientID, productID, QTY)`
clientID productID QTY transactionID date
<dbl> <chr> <dbl> <dbl> <date>
1 101 P002 3 2 2023-05-02
2 101 P006 -5 9 2023-05-09
3 103 P004 5 4 2023-05-04
giving you the desired result, however this can be dangerous (because of the drop of client 1 and product 001).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论