通过列的组合删除具有对称值的行

huangapple go评论106阅读模式
英文:

Remove rows with symmetric values by a combination of columns

问题

我有一个数据框,想要根据其他列的组合来删除具有对称值的行。本质上,我想从我的销售数据框中删除退款。

我的初始数据框看起来像这样:

  1. df <- data.frame(
  2. clientID = c(101, 101, 102, 103, 103),
  3. transactionID = c(1, 2, 3, 4, 5),
  4. date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05")),
  5. productID = c("P001", "P002", "P003", "P004", "P005"),
  6. QTY = c(2, 3, 1, 5, 2)
  7. )
  8. refund_rows <- data.frame(
  9. clientID = c(101, 102, 103, 101),
  10. transactionID = c(6, 7, 8, 9),
  11. date = as.Date(c("2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
  12. productID = c("P001", "P003", "P005", "P006"),
  13. QTY = c(-1, -1, -2, -5)
  14. )
  15. final_df <- bind_rows(df, refund_rows)

我希望最终的数据框看起来像这样:

  1. clientID transactionID date productID QTY
  2. 101 2 2023-05-02 P002 3
  3. 103 4 2023-05-04 P004 5
  4. 101 9 2023-05-09 P006 -5

如何在R中实现这个目标?

我尝试了以下方法,但问题是我得到了transactionID = 9,它的QTY应该是负数:

  1. final_df <- data.frame(
  2. clientID = c(101, 101, 102, 103, 103, 101, 102, 103, 101),
  3. transactionID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
  4. date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05", "2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
  5. productID = c("P001", "P002", "P003", "P004", "P005", "P001", "P003", "P005", "P006"),
  6. QTY = c(2, 3, 1, 5, 2, -1, -1, -2, -5)
  7. )
  8. refund_rows_new <- final_df[final_df$QTY < 0,]
  9. refund_rows_abs <- refund_rows_new %>%
  10. mutate(QTY = abs(QTY))
  11. final_df_new <- final_df[final_df$QTY > 0,]
  12. final_df_new %>% anti_join(refund_rows_abs, by = c("clientID", "productID", "QTY"))

这是删除了QTY值为正数的退款行,但没有解决transactionID = 9的问题。

英文:

I have a dataframe and I want to remove rows that have a symmetric value in a column based on a combination of other columns. In essence I want to remove the refunds from my sales dataframe.
My initial dataframe looks like this:

  1. df &lt;- data.frame(
  2. clientID = c(101, 101, 102, 103, 103),
  3. transactionID = c(1, 2, 3, 4, 5),
  4. date = as.Date(c(&quot;2023-05-01&quot;, &quot;2023-05-02&quot;, &quot;2023-05-03&quot;, &quot;2023-05-04&quot;, &quot;2023-05-05&quot;)),
  5. productID = c(&quot;P001&quot;, &quot;P002&quot;, &quot;P003&quot;, &quot;P004&quot;, &quot;P005&quot;),
  6. QTY = c(2, 3, 1, 5, 2)
  7. )
  8. refund_rows &lt;- data.frame(
  9. clientID = c(101, 102, 103, 101),
  10. transactionID = c(6, 7, 8, 9),
  11. date = as.Date(c(&quot;2023-05-07&quot;, &quot;2023-05-06&quot;, &quot;2023-05-08&quot;, &quot;2023-05-09&quot;)),
  12. productID = c(&quot;P001&quot;, &quot;P003&quot;, &quot;P005&quot;, &quot;P006&quot;),
  13. QTY = c(-1, -1, -2, -5)
  14. )
  15. final_df &lt;- bind_rows(df, refund_rows)

I want my final dataframe to look like this:

  1. clientID transactionID date productID QTY
  2. 101 2 2023-05-02 P002 3
  3. 103 4 2023-05-04 P004 5
  4. 101 9 2023-05-09 P006 -5

How can I do this in R?

I tried the following but the problem is I am left with the transactionID = 9 which should be negative QTY

  1. final_df &lt;- data.frame(
  2. clientID = c(101, 101, 102, 103, 103, 101, 102, 103, 101),
  3. transactionID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
  4. date = as.Date(c(&quot;2023-05-01&quot;, &quot;2023-05-02&quot;, &quot;2023-05-03&quot;, &quot;2023-05-04&quot;, &quot;2023-05-05&quot;, &quot;2023-05-07&quot;, &quot;2023-05-06&quot;, &quot;2023-05-08&quot;, &quot;2023-05-09&quot;)),
  5. productID = c(&quot;P001&quot;, &quot;P002&quot;, &quot;P003&quot;, &quot;P004&quot;, &quot;P005&quot;, &quot;P001&quot;, &quot;P003&quot;, &quot;P005&quot;, &quot;P006&quot;),
  6. QTY = c(2, 3, 1, 5, 2, -1, -1, -2, -5)
  7. )
  8. refund_rows_new &lt;- final_df[final_df$QTY &lt; 0,]
  9. refund_rows_abs &lt;- refund_rows_new %&gt;%
  10. mutate(QTY = abs(QTY))
  11. final_df_new &lt;- final_df[final_df$QTY &gt; 0,]
  12. final_df_new %&gt;% anti_join(refund_rows_abs, by = c(&quot;clientID&quot;, &quot;productID&quot;, &quot;QTY&quot;))

答案1

得分: 0

以下是翻译好的代码部分:

  1. final_df %>%
  2. group_by(clientID, productID) %>%
  3. filter(sum(QTY) != 0)
  1. final_df %>%
  2. group_by(clientID, productID) %>%
  3. summarise(QTY = sum(QTY)) %>%
  4. filter(QTY != 0) %>%
  5. left_join(final_df)
  1. final_df %>%
  2. group_by(clientID, productID) %>%
  3. summarise(QTY = sum(QTY)) %>%
  4. filter(QTY != 0) %>%
  5. left_join(final_df) %>%
  6. na.omit()

希望这对你有所帮助。如果你需要进一步的解释或有其他问题,请随时提出。

英文:

My first idea would be to group by clientID and productID and filter based on the sum of QTY.

  1. final_df %&gt;%
  2. group_by(clientID,productID) %&gt;%
  3. filter(sum(QTY)!=0)
  4. clientID transactionID date productID QTY
  5. &lt;dbl&gt; &lt;dbl&gt; &lt;date&gt; &lt;chr&gt; &lt;dbl&gt;
  6. 1 101 1 2023-05-01 P001 2
  7. 2 101 2 2023-05-02 P002 3
  8. 3 103 4 2023-05-04 P004 5
  9. 4 101 6 2023-05-07 P001 -1
  10. 5 101 9 2023-05-09 P006 -5

however this gives a different result from what you have requested. because client 1 bought 2 of product P001 and got a refund for 1.

So if you want to omit those entries you could do something in the lines of:

  1. final_df %&gt;%
  2. group_by(clientID,productID) %&gt;%
  3. summarise(QTY=sum(QTY))%&gt;%
  4. filter(QTY!=0) %&gt;%
  5. left_join(final_df)
  6. clientID productID QTY transactionID date
  7. &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;date&gt;
  8. 1 101 P001 1 NA NA
  9. 2 101 P002 3 2 2023-05-02
  10. 3 101 P006 -5 9 2023-05-09
  11. 4 103 P004 5 4 2023-05-04

and omit rows containing NA

  1. final_df %&gt;%
  2. group_by(clientID,productID) %&gt;%
  3. summarise(QTY=sum(QTY))%&gt;%
  4. filter(QTY!=0) %&gt;%
  5. left_join(final_df) %&gt;%
  6. na.omit()
  7. Joining with `by = join_by(clientID, productID, QTY)`
  8. clientID productID QTY transactionID date
  9. &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;date&gt;
  10. 1 101 P002 3 2 2023-05-02
  11. 2 101 P006 -5 9 2023-05-09
  12. 3 103 P004 5 4 2023-05-04

giving you the desired result, however this can be dangerous (because of the drop of client 1 and product 001).

huangapple
  • 本文由 发表于 2023年5月28日 23:50:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76352318.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定