2023年6月1日 22:12:48go评论100阅读模式

英文:

How can I efficiently calculate differences between values of a column and the preceding column in R?

问题

我正在尝试计算每行列值与前一列值之间的差异，例如：

1 2 3
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120

我想要代码执行以下操作：
（第2列中的值 - 第1列中的值），（第3列中的值 - 第2列中的值），（第4列中的值 - 第3列中的值），等等...

以便将计算出的值存储在一个新表格中，如下所示：

1 2
R1. 10 30
R2. 15 20
R3. 10 20

我一直在一个小列集上手动执行此操作，但是否有代码可以更高效地在较大的列/行集上执行此操作？

# 计算差异
diff_new = tab_new[,2] - tab_new[,1]
diff_new2 = tab_new[,3] - tab_new[,2]
diff_new3 = tab_new[,4] - tab_new[,3]
diff_new4 = tab_new[,5] - tab_new[,4]
# 创建包含差异的新表格
diff.table_new = cbind(diff_new, diff_new2, diff_new3, diff_new4)

如果需要在较大的列/行集上执行相同的操作，你可以使用循环或函数来自动化这个过程，以避免手动指定每一列。

英文:

I'm trying to calculate the differences between values of a column and the values from a preceding column for each row, for example:

    1   2   3   
R1. 50 60. 90
R2. 80.95. 115
R3. 90 100 120

I would want the code to do the following:
(values in col 2 - values in col 1), (values in col 3 - values in col 2), (values in col 4 - values in col 3), etc..

for an output that would store the calculated values in a new table like this:

I've been doing this manually on a small set of columns as follows, but is there a code to more efficiently do this on a larger set of columns/rows

#to calculate differences
diff_new = tab_new[,2] - tab_new[,1]
diff_new2 = tab_new[,3] - tab_new[,2]
diff_new3 = tab_new[,4] - tab_new[,3]
diff_new4 = tab_new[,5] - tab_new[,4]
#to create a new table with differences
diff.table_new = cbind(diff_new,diff_new2,diff_new3,diff_new4)

答案1

得分: 3

一个基本的R选项是：
```R
df[-1] - df[-ncol(df)] # 感谢 @user20650

输出：

    X2 X3
R1. 10 30
R2. 15 20
R3. 10 20

或者使用sapply：

sapply(seq_len(ncol(df))[-1], function(x) df[,x] - df[,x-1])

输出：

     [,1] [,2]
[1,]   10   30
[2,]   15   20
[3,]   10   20

数据：

df <- read.table(text = "1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120", h = TRUE)


<details>
<summary>英文:</summary>
One base R option would be

df[-1] - df[-ncol(df)] # thanks @user20650

Output

X2 X3

R1. 10 30
R2. 15 20
R3. 10 20

Or use `sapply`:

sapply(seq_len(ncol(df))[-1], function(x) df[,x] - df[,x-1])

Output:

 [,1] [,2]

[1,] 10 30
[2,] 15 20
[3,] 10 20

Data

df <- read.table(text = "1 2 3
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120", h = TRUE)


</details>
# 答案2
**得分**: 2
`lag`的`diff`：
```R
df <- read.table(header = T, text = "    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120")
matrix(diff(unlist(df), lag=nrow(df)), nrow=nrow(df))

英文:

diff with a lag:

df &lt;- read.table(header = T, text = &quot;    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120&quot;)
matrix(diff(unlist(df), lag=nrow(df)), nrow=nrow(df))

答案3

得分: 1

另一个完全矢量化的选项：

data.frame(t(diff(t(df))))

你可以使用 apply + diff：

df <- read.table(header = T, text = "    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120")
t(apply(df, 1, diff))
#    X2 X3
#R1. 10 30
#R2. 15 20
#R3. 10 20

英文:

Another fully vectorized option:

data.frame(t(diff(t(df))))

You can use apply + diff:

df &lt;- read.table(header = T, text = &quot;    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120&quot;)
t(apply(df, 1, diff))
#    X2 X3
#R1. 10 30
#R2. 15 20
#R3. 10 20

答案4

得分: 0

以下是翻译好的部分：

"An efficient solution using data.table and collapse as long as your data doesn't have NA values.

Here we make a copy of the original data (except for the 1st col) and modify the columns by reference.

library(data.table)
library(collapse)
# Take a copy
out <- copy(fselect(df, -1))
# Subtract lagged columns by reference
for (i in seq_col(out)){
  out[[i]] %-=% df[[i]]
}
out
#>     X2 X3
#> R1. 10 30
#> R2. 15 20
#> R3. 10 20

^{Created on 2023-06-02 with reprex v2.0.2}

Data

df <- read.table(header = T, text = &quot;    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120&quot;)
```"
<details>
<summary>英文:</summary>
An efficient solution using `data.table` and `collapse` as long as your data doesn&#39;t have `NA` values.
Here we make a copy of the original data (except for the 1st col) and modify the columns by reference.
``` r
library(data.table)
library(collapse)
# Take a copy
out &lt;- copy(fselect(df, -1))
# Subtract lagged columns by reference
for (i in seq_col(out)){
  out[[i]] %-=% df[[i]]
}
out
#&gt;     X2 X3
#&gt; R1. 10 30
#&gt; R2. 15 20
#&gt; R3. 10 20

<sup>Created on 2023-06-02 with reprex v2.0.2</sup>

Data

df &lt;- read.table(header = T, text = &quot;    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我如何在R中高效计算一列值与前一列的差异？

问题

答案1

答案3

答案4

Data

Data

使用整洁评估（tidy evaluation）将表达式作为字符传递给 ggplot2::aes()。

使用另一个数值变量的区间对一个变量进行平均。

修复R中的文本编码。

如何比较一组向量以查找它们是否包含共同元素？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论