2023年5月28日 23:50:40go评论106阅读模式

英文:

Remove rows with symmetric values by a combination of columns

问题

我有一个数据框，想要根据其他列的组合来删除具有对称值的行。本质上，我想从我的销售数据框中删除退款。

我的初始数据框看起来像这样：

df <- data.frame(
  clientID = c(101, 101, 102, 103, 103),
  transactionID = c(1, 2, 3, 4, 5),
  date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05")),
  productID = c("P001", "P002", "P003", "P004", "P005"),
  QTY = c(2, 3, 1, 5, 2)
)
refund_rows <- data.frame(
  clientID = c(101, 102, 103, 101),
  transactionID = c(6, 7, 8, 9),
  date = as.Date(c("2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
  productID = c("P001", "P003", "P005", "P006"),
  QTY = c(-1, -1, -2, -5)
)
final_df <- bind_rows(df, refund_rows)

我希望最终的数据框看起来像这样：

clientID transactionID date       productID QTY
101      2             2023-05-02 P002      3
103      4             2023-05-04 P004      5
101      9             2023-05-09 P006     -5

如何在R中实现这个目标？

我尝试了以下方法，但问题是我得到了transactionID = 9，它的QTY应该是负数：

final_df <- data.frame(
  clientID = c(101, 101, 102, 103, 103, 101, 102, 103, 101),
  transactionID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
  date = as.Date(c("2023-05-01", "2023-05-02", "2023-05-03", "2023-05-04", "2023-05-05", "2023-05-07", "2023-05-06", "2023-05-08", "2023-05-09")),
  productID = c("P001", "P002", "P003", "P004", "P005", "P001", "P003", "P005", "P006"),
  QTY = c(2, 3, 1, 5, 2, -1, -1, -2, -5)
)
refund_rows_new <- final_df[final_df$QTY < 0,]
refund_rows_abs <- refund_rows_new %>%
  mutate(QTY = abs(QTY))
final_df_new <- final_df[final_df$QTY > 0,]
final_df_new %>% anti_join(refund_rows_abs, by = c("clientID", "productID", "QTY"))

这是删除了QTY值为正数的退款行，但没有解决transactionID = 9的问题。

英文:

I have a dataframe and I want to remove rows that have a symmetric value in a column based on a combination of other columns. In essence I want to remove the refunds from my sales dataframe.
My initial dataframe looks like this:

df &lt;- data.frame(
  clientID = c(101, 101, 102, 103, 103),
  transactionID = c(1, 2, 3, 4, 5),
  date = as.Date(c(&quot;2023-05-01&quot;, &quot;2023-05-02&quot;, &quot;2023-05-03&quot;, &quot;2023-05-04&quot;, &quot;2023-05-05&quot;)),
  productID = c(&quot;P001&quot;, &quot;P002&quot;, &quot;P003&quot;, &quot;P004&quot;, &quot;P005&quot;),
  QTY = c(2, 3, 1, 5, 2)
)
refund_rows &lt;- data.frame(
  clientID = c(101, 102, 103, 101),
  transactionID = c(6, 7, 8, 9),
  date = as.Date(c(&quot;2023-05-07&quot;, &quot;2023-05-06&quot;, &quot;2023-05-08&quot;, &quot;2023-05-09&quot;)),
  productID = c(&quot;P001&quot;, &quot;P003&quot;, &quot;P005&quot;, &quot;P006&quot;),
  QTY = c(-1, -1, -2, -5)
)
final_df &lt;- bind_rows(df, refund_rows)

I want my final dataframe to look like this:

clientID transactionID date productID QTY
101	2	2023-05-02	P002 3
103	4	2023-05-04	P004 5
101 9   2023-05-09  P006 -5

How can I do this in R?

I tried the following but the problem is I am left with the transactionID = 9 which should be negative QTY

final_df &lt;- data.frame(
  clientID = c(101, 101, 102, 103, 103, 101, 102, 103, 101),
  transactionID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
  date = as.Date(c(&quot;2023-05-01&quot;, &quot;2023-05-02&quot;, &quot;2023-05-03&quot;, &quot;2023-05-04&quot;, &quot;2023-05-05&quot;, &quot;2023-05-07&quot;, &quot;2023-05-06&quot;, &quot;2023-05-08&quot;, &quot;2023-05-09&quot;)),
  productID = c(&quot;P001&quot;, &quot;P002&quot;, &quot;P003&quot;, &quot;P004&quot;, &quot;P005&quot;, &quot;P001&quot;, &quot;P003&quot;, &quot;P005&quot;, &quot;P006&quot;),
  QTY = c(2, 3, 1, 5, 2, -1, -1, -2, -5)
)
refund_rows_new &lt;- final_df[final_df$QTY &lt; 0,]
refund_rows_abs &lt;- refund_rows_new %&gt;% 
  mutate(QTY = abs(QTY))
final_df_new &lt;- final_df[final_df$QTY &gt; 0,]
final_df_new %&gt;% anti_join(refund_rows_abs, by = c(&quot;clientID&quot;, &quot;productID&quot;, &quot;QTY&quot;))

答案1

得分: 0

以下是翻译好的代码部分：

final_df %>%
  group_by(clientID, productID) %>%
  filter(sum(QTY) != 0)

final_df %>%
  group_by(clientID, productID) %>%
  summarise(QTY = sum(QTY)) %>%
  filter(QTY != 0) %>%
  left_join(final_df)

final_df %>%
  group_by(clientID, productID) %>%
  summarise(QTY = sum(QTY)) %>%
  filter(QTY != 0) %>%
  left_join(final_df) %>%
  na.omit()

希望这对你有所帮助。如果你需要进一步的解释或有其他问题，请随时提出。

英文:

My first idea would be to group by clientID and productID and filter based on the sum of QTY.

final_df %&gt;%
  group_by(clientID,productID) %&gt;% 
  filter(sum(QTY)!=0) 
  clientID transactionID date       productID   QTY
     &lt;dbl&gt;         &lt;dbl&gt; &lt;date&gt;     &lt;chr&gt;     &lt;dbl&gt;
1      101             1 2023-05-01 P001          2
2      101             2 2023-05-02 P002          3
3      103             4 2023-05-04 P004          5
4      101             6 2023-05-07 P001         -1
5      101             9 2023-05-09 P006         -5

however this gives a different result from what you have requested. because client 1 bought 2 of product P001 and got a refund for 1.

So if you want to omit those entries you could do something in the lines of:

final_df %&gt;%
  group_by(clientID,productID) %&gt;% 
  summarise(QTY=sum(QTY))%&gt;% 
  filter(QTY!=0) %&gt;% 
  left_join(final_df)
  clientID productID   QTY transactionID date      
     &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt;         &lt;dbl&gt; &lt;date&gt;    
1      101 P001          1            NA NA        
2      101 P002          3             2 2023-05-02
3      101 P006         -5             9 2023-05-09
4      103 P004          5             4 2023-05-04

and omit rows containing NA

final_df %&gt;%
 group_by(clientID,productID) %&gt;% 
 summarise(QTY=sum(QTY))%&gt;% 
 filter(QTY!=0) %&gt;% 
 left_join(final_df) %&gt;% 
 na.omit()
Joining with `by = join_by(clientID, productID, QTY)`
  clientID productID   QTY transactionID date      
     &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt;         &lt;dbl&gt; &lt;date&gt;    
1      101 P002          3             2 2023-05-02
2      101 P006         -5             9 2023-05-09
3      103 P004          5             4 2023-05-04

giving you the desired result, however this can be dangerous (because of the drop of client 1 and product 001).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

通过列的组合删除具有对称值的行

问题

答案1

如何从稀疏向量创建对角矩阵

按首次发生日期和姓名在R中保留记录。

如何将 ggplot 中的 “fill” 变量的条形图排列在一起？

在使用ggplotly()时，反转图例顺序。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。