通过列的组合删除具有对称值的行

huangapple go评论69阅读模式
英文:

Remove rows with symmetric values by a combination of columns

问题

我有一个数据框,想要根据其他列的组合来删除具有对称值的行。本质上,我想从我的销售数据框中删除退款。

我的初始数据框看起来像这样:

df <- data.frame(
  clientID = c(101, 101, 102, 103, 103),
  transactionID = c(1, 2, 3, 4, 5),
  date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05")),
  productID = c("P001", "P002", "P003", "P004", "P005"),
  QTY = c(2, 3, 1, 5, 2)
)
refund_rows <- data.frame(
  clientID = c(101, 102, 103, 101),
  transactionID = c(6, 7, 8, 9),
  date = as.Date(c("2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
  productID = c("P001", "P003", "P005", "P006"),
  QTY = c(-1, -1, -2, -5)
)
final_df <- bind_rows(df, refund_rows)

我希望最终的数据框看起来像这样:

clientID transactionID date       productID QTY
101      2             2023-05-02 P002      3
103      4             2023-05-04 P004      5
101      9             2023-05-09 P006     -5

如何在R中实现这个目标?

我尝试了以下方法,但问题是我得到了transactionID = 9,它的QTY应该是负数:

final_df <- data.frame(
  clientID = c(101, 101, 102, 103, 103, 101, 102, 103, 101),
  transactionID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
  date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05", "2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
  productID = c("P001", "P002", "P003", "P004", "P005", "P001", "P003", "P005", "P006"),
  QTY = c(2, 3, 1, 5, 2, -1, -1, -2, -5)
)

refund_rows_new <- final_df[final_df$QTY < 0,]

refund_rows_abs <- refund_rows_new %>%
  mutate(QTY = abs(QTY))

final_df_new <- final_df[final_df$QTY > 0,]

final_df_new %>% anti_join(refund_rows_abs, by = c("clientID", "productID", "QTY"))

这是删除了QTY值为正数的退款行,但没有解决transactionID = 9的问题。

英文:

I have a dataframe and I want to remove rows that have a symmetric value in a column based on a combination of other columns. In essence I want to remove the refunds from my sales dataframe.
My initial dataframe looks like this:

df &lt;- data.frame(
  clientID = c(101, 101, 102, 103, 103),
  transactionID = c(1, 2, 3, 4, 5),
  date = as.Date(c(&quot;2023-05-01&quot;, &quot;2023-05-02&quot;, &quot;2023-05-03&quot;, &quot;2023-05-04&quot;, &quot;2023-05-05&quot;)),
  productID = c(&quot;P001&quot;, &quot;P002&quot;, &quot;P003&quot;, &quot;P004&quot;, &quot;P005&quot;),
  QTY = c(2, 3, 1, 5, 2)
)
refund_rows &lt;- data.frame(
  clientID = c(101, 102, 103, 101),
  transactionID = c(6, 7, 8, 9),
  date = as.Date(c(&quot;2023-05-07&quot;, &quot;2023-05-06&quot;, &quot;2023-05-08&quot;, &quot;2023-05-09&quot;)),
  productID = c(&quot;P001&quot;, &quot;P003&quot;, &quot;P005&quot;, &quot;P006&quot;),
  QTY = c(-1, -1, -2, -5)
)
final_df &lt;- bind_rows(df, refund_rows)

I want my final dataframe to look like this:

clientID transactionID date productID QTY
101	2	2023-05-02	P002 3
103	4	2023-05-04	P004 5
101 9   2023-05-09  P006 -5

How can I do this in R?

I tried the following but the problem is I am left with the transactionID = 9 which should be negative QTY

final_df &lt;- data.frame(
  clientID = c(101, 101, 102, 103, 103, 101, 102, 103, 101),
  transactionID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
  date = as.Date(c(&quot;2023-05-01&quot;, &quot;2023-05-02&quot;, &quot;2023-05-03&quot;, &quot;2023-05-04&quot;, &quot;2023-05-05&quot;, &quot;2023-05-07&quot;, &quot;2023-05-06&quot;, &quot;2023-05-08&quot;, &quot;2023-05-09&quot;)),
  productID = c(&quot;P001&quot;, &quot;P002&quot;, &quot;P003&quot;, &quot;P004&quot;, &quot;P005&quot;, &quot;P001&quot;, &quot;P003&quot;, &quot;P005&quot;, &quot;P006&quot;),
  QTY = c(2, 3, 1, 5, 2, -1, -1, -2, -5)
)

refund_rows_new &lt;- final_df[final_df$QTY &lt; 0,]

refund_rows_abs &lt;- refund_rows_new %&gt;% 
  mutate(QTY = abs(QTY))

final_df_new &lt;- final_df[final_df$QTY &gt; 0,]

final_df_new %&gt;% anti_join(refund_rows_abs, by = c(&quot;clientID&quot;, &quot;productID&quot;, &quot;QTY&quot;))

答案1

得分: 0

以下是翻译好的代码部分:

final_df %>%
  group_by(clientID, productID) %>%
  filter(sum(QTY) != 0)
final_df %>%
  group_by(clientID, productID) %>%
  summarise(QTY = sum(QTY)) %>%
  filter(QTY != 0) %>%
  left_join(final_df)
final_df %>%
  group_by(clientID, productID) %>%
  summarise(QTY = sum(QTY)) %>%
  filter(QTY != 0) %>%
  left_join(final_df) %>%
  na.omit()

希望这对你有所帮助。如果你需要进一步的解释或有其他问题,请随时提出。

英文:

My first idea would be to group by clientID and productID and filter based on the sum of QTY.

final_df %&gt;%
  group_by(clientID,productID) %&gt;% 
  filter(sum(QTY)!=0) 

  clientID transactionID date       productID   QTY
     &lt;dbl&gt;         &lt;dbl&gt; &lt;date&gt;     &lt;chr&gt;     &lt;dbl&gt;
1      101             1 2023-05-01 P001          2
2      101             2 2023-05-02 P002          3
3      103             4 2023-05-04 P004          5
4      101             6 2023-05-07 P001         -1
5      101             9 2023-05-09 P006         -5

however this gives a different result from what you have requested. because client 1 bought 2 of product P001 and got a refund for 1.

So if you want to omit those entries you could do something in the lines of:

final_df %&gt;%
  group_by(clientID,productID) %&gt;% 
  summarise(QTY=sum(QTY))%&gt;% 
  filter(QTY!=0) %&gt;% 
  left_join(final_df)

  clientID productID   QTY transactionID date      
     &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt;         &lt;dbl&gt; &lt;date&gt;    
1      101 P001          1            NA NA        
2      101 P002          3             2 2023-05-02
3      101 P006         -5             9 2023-05-09
4      103 P004          5             4 2023-05-04

and omit rows containing NA

final_df %&gt;%
 group_by(clientID,productID) %&gt;% 
 summarise(QTY=sum(QTY))%&gt;% 
 filter(QTY!=0) %&gt;% 
 left_join(final_df) %&gt;% 
 na.omit()

Joining with `by = join_by(clientID, productID, QTY)`
  clientID productID   QTY transactionID date      
     &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt;         &lt;dbl&gt; &lt;date&gt;    
1      101 P002          3             2 2023-05-02
2      101 P006         -5             9 2023-05-09
3      103 P004          5             4 2023-05-04

giving you the desired result, however this can be dangerous (because of the drop of client 1 and product 001).

huangapple
  • 本文由 发表于 2023年5月28日 23:50:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76352318.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定