2023年6月19日 23:00:58go评论89阅读模式

英文:

How to compare values in two columns and if values are equal keep as is, but if values are different, sum by row

问题

使用R，我想要比较两列中的数值，如果这些数值相等，我想保留它们不变，如果这些数值不同，我想将这两列中的数值相加。

示例数据框：

structure(list(colA = c(0, 0, 0, 412.99, 34.43, 117.36, 193.05, 
305.22), colB = c(0, 0, 0, 412.99, 6.89, 0, 193.05, 305.22)), class = "data.frame", row.names = c(1323L, 
5426L, 2772L, 7241L, 2547L, 874L, 5908L, 6830L))

如果两列的值不同，您想要创建一个新的列（"colC"），该列包含两列的值的和。例如，在这个示例中，您想要将第2547行的34.43和6.89相加（因为它们不相等），并保持第7241行的412.99（因为它们相等）。另外，可能还需要另一列（比如"colD"），用来表示哪些行的值相同或不同（以了解哪些观察结果是不同的）。

我理解您的实际数据框具有10,000多个观察结果和30多个变量（列），而您只想比较其中的两列。谢谢。

英文:

Using R, I would like to compare values in two columns, if the values are equal I would like to keep them as is, if the values are different, I would like to sum the values in the two columns.
This seems like a simple operation but I can't figure out how to do it, have found similar posts on SO but not quite this. Something using ifelse maybe?

example df:

structure(list(colA = c(0, 0, 0, 412.99, 34.43, 117.36, 193.05, 
305.22), colB = c(0, 0, 0, 412.99, 6.89, 0, 193.05, 305.22)), class = &quot;data.frame&quot;, row.names = c(1323L, 
5426L, 2772L, 7241L, 2547L, 874L, 5908L, 6830L))

Would like to create new column ("colC") with the row sum of A & B if values are different (in this example, sum 34.43 and 6.89 (for row 2547) and keep 412.99 (row 7241; since value is the same for colA and colB). Additionally, it would be helpful to have another column (say "colD") that somehow states whether rows where same or not (to know which obs where different).

My actual df has 10,000+ observations and 30+ variables (columns). I only want to compare two columns within the 30+ cols I have.
Thank you.

答案1

得分: 2

ifelse() 可能会有所帮助。

df <- structure(list(colA = c(0, 0, 0, 412.99, 34.43, 117.36, 193.05, 
                        305.22), colB = c(0, 0, 0, 412.99, 6.89, 0, 193.05, 305.22)), class = "data.frame", row.names = c(1323L, 
                                                                                                                          5426L, 2772L, 7241L, 2547L, 874L, 5908L, 6830L))
df |>
  dplyr::mutate(colC = ifelse(colA == colB, colA, colA+colB))
#>        colA   colB   colC
#> 1323   0.00   0.00   0.00
#> 5426   0.00   0.00   0.00
#> 2772   0.00   0.00   0.00
#> 7241 412.99 412.99 412.99
#> 2547  34.43   6.89  41.32
#> 874  117.36   0.00 117.36
#> 5908 193.05 193.05 193.05
#> 6830 305.22 305.22 305.22

^{创建于2023年06月19日，使用 reprex v2.0.2}

英文:

ifelse() might help.

df &lt;- structure(list(colA = c(0, 0, 0, 412.99, 34.43, 117.36, 193.05, 
                        305.22), colB = c(0, 0, 0, 412.99, 6.89, 0, 193.05, 305.22)), class = &quot;data.frame&quot;, row.names = c(1323L, 
                                                                                                                          5426L, 2772L, 7241L, 2547L, 874L, 5908L, 6830L))
df |&gt;
  dplyr::mutate(colC = ifelse(colA == colB, colA, colA+colB))
#&gt;        colA   colB   colC
#&gt; 1323   0.00   0.00   0.00
#&gt; 5426   0.00   0.00   0.00
#&gt; 2772   0.00   0.00   0.00
#&gt; 7241 412.99 412.99 412.99
#&gt; 2547  34.43   6.89  41.32
#&gt; 874  117.36   0.00 117.36
#&gt; 5908 193.05 193.05 193.05
#&gt; 6830 305.22 305.22 305.22

<sup>Created on 2023-06-19 with reprex v2.0.2</sup>

答案2

得分: 0

Option 1：如果行名称很重要，请使用 setDT(df, keep.rownames = TRUE)

library(data.table)

df <- structure(list(colA = c(0, 0, 0, 412.99, 34.43, 117.36, 193.05, 305.22), 
                     colB = c(0, 0, 0, 412.99, 6.89, 0, 193.05, 305.22)), 
                class = "data.frame", 
                row.names = c(1323L, 5426L, 2772L, 7241L, 2547L, 874L, 5908L, 6830L))

setDT(df)

df[, `:=`(colC = colA, colD = 1L)]
df[colA != colB, `:=`(colC = colA + colB, colD = 0)][]

     colA   colB   colC colD
1:   0.00   0.00   0.00    1
2:   0.00   0.00   0.00    1
3:   0.00   0.00   0.00    1
4: 412.99 412.99 412.99    1
5:  34.43   6.89  41.32    0
6: 117.36   0.00 117.36    0
7: 193.05 193.05 193.05    1
8: 305.22 305.22 305.22    1

Option 2：类似于 @Grzegorz Sapijaszko 的答案

library(data.table)

df <- structure(list(colA = c(0, 0, 0, 412.99, 34.43, 117.36, 193.05, 305.22), 
                     colB = c(0, 0, 0, 412.99, 6.89, 0, 193.05, 305.22)), 
                class = "data.frame", 
                row.names = c(1323L, 5426L, 2772L, 7241L, 2547L, 874L, 5908L, 6830L))

setDT(df)

df[, colC := fifelse(colA == colB, colA, colA + colB)][] # 对于 `colD` 也是类似的

     colA   colB   colC
1:   0.00   0.00   0.00
2:   0.00   0.00   0.00
3:   0.00   0.00   0.00
4: 412.99 412.99 412.99
5:  34.43   6.89  41.32
6: 117.36   0.00 117.36
7: 193.05 193.05 193.05
8: 305.22 305.22 305.22

英文:

Using data.table:

Option 1: Use setDT(df, keep.rownames = TRUE) if row names are important

library(data.table)

df &lt;- structure(list(colA = c(0, 0, 0, 412.99, 34.43, 117.36, 193.05, 305.22), 
                     colB = c(0, 0, 0, 412.99, 6.89, 0, 193.05, 305.22)), 
                class = &quot;data.frame&quot;, 
                row.names = c(1323L, 5426L, 2772L, 7241L, 2547L, 874L, 5908L, 6830L))

setDT(df)

df[, `:=`(colC = colA, colD = 1L)]
df[colA != colB, `:=`(colC = colA + colB, colD = 0)][]

     colA   colB   colC colD
1:   0.00   0.00   0.00    1
2:   0.00   0.00   0.00    1
3:   0.00   0.00   0.00    1
4: 412.99 412.99 412.99    1
5:  34.43   6.89  41.32    0
6: 117.36   0.00 117.36    0
7: 193.05 193.05 193.05    1
8: 305.22 305.22 305.22    1

Option 2: Similar to @Grzegorz Sapijaszko answer

library(data.table)

df &lt;- structure(list(colA = c(0, 0, 0, 412.99, 34.43, 117.36, 193.05, 305.22), 
                     colB = c(0, 0, 0, 412.99, 6.89, 0, 193.05, 305.22)), 
                class = &quot;data.frame&quot;, 
                row.names = c(1323L, 5426L, 2772L, 7241L, 2547L, 874L, 5908L, 6830L))

setDT(df)

df[, colC := fifelse(colA == colB, colA, colA + colB)][] #Analogous for `colD`

     colA   colB   colC
1:   0.00   0.00   0.00
2:   0.00   0.00   0.00
3:   0.00   0.00   0.00
4: 412.99 412.99 412.99
5:  34.43   6.89  41.32
6: 117.36   0.00 117.36
7: 193.05 193.05 193.05
8: 305.22 305.22 305.22

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to compare values in two columns and if values are equal keep as is, but if values are different, sum by row

问题

答案1

答案2

在dplyr的if_else中如何编写多个“或”条件？

只保留R中每个组中的最后一个重复项。

Base R：在使用xlim时，绘图超出绘图窗口。

如何可能将QCA_mm类对象强制转换为表格？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论