我如何在R中高效计算一列值与前一列的差异?

huangapple go评论100阅读模式
英文:

How can I efficiently calculate differences between values of a column and the preceding column in R?

问题

我正在尝试计算每行列值与前一列值之间的差异,例如:

1 2 3
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120

我想要代码执行以下操作:
(第2列中的值 - 第1列中的值),(第3列中的值 - 第2列中的值),(第4列中的值 - 第3列中的值),等等...

以便将计算出的值存储在一个新表格中,如下所示:

1 2
R1. 10 30
R2. 15 20
R3. 10 20

我一直在一个小列集上手动执行此操作,但是否有代码可以更高效地在较大的列/行集上执行此操作?

  1. # 计算差异
  2. diff_new = tab_new[,2] - tab_new[,1]
  3. diff_new2 = tab_new[,3] - tab_new[,2]
  4. diff_new3 = tab_new[,4] - tab_new[,3]
  5. diff_new4 = tab_new[,5] - tab_new[,4]
  6. # 创建包含差异的新表格
  7. diff.table_new = cbind(diff_new, diff_new2, diff_new3, diff_new4)

如果需要在较大的列/行集上执行相同的操作,你可以使用循环或函数来自动化这个过程,以避免手动指定每一列。

英文:

I'm trying to calculate the differences between values of a column and the values from a preceding column for each row, for example:

  1. 1 2 3
  2. R1. 50 60. 90
  3. R2. 80.95. 115
  4. R3. 90 100 120

I would want the code to do the following:
(values in col 2 - values in col 1), (values in col 3 - values in col 2), (values in col 4 - values in col 3), etc..

for an output that would store the calculated values in a new table like this:

  1. 1 2
  2. R1. 10 30.
  3. R2. 10.20
  4. R3. 10 20

I've been doing this manually on a small set of columns as follows, but is there a code to more efficiently do this on a larger set of columns/rows

  1. #to calculate differences
  2. diff_new = tab_new[,2] - tab_new[,1]
  3. diff_new2 = tab_new[,3] - tab_new[,2]
  4. diff_new3 = tab_new[,4] - tab_new[,3]
  5. diff_new4 = tab_new[,5] - tab_new[,4]
  6. #to create a new table with differences
  7. diff.table_new = cbind(diff_new,diff_new2,diff_new3,diff_new4)

答案1

得分: 3

  1. 一个基本的R选项是:
  2. ```R
  3. df[-1] - df[-ncol(df)] # 感谢 @user20650

输出:

  1. X2 X3
  2. R1. 10 30
  3. R2. 15 20
  4. R3. 10 20

或者使用sapply

  1. sapply(seq_len(ncol(df))[-1], function(x) df[,x] - df[,x-1])

输出:

  1. [,1] [,2]
  2. [1,] 10 30
  3. [2,] 15 20
  4. [3,] 10 20

数据:

  1. df <- read.table(text = "1 2 3
  2. R1. 50 60 90
  3. R2. 80 95 115
  4. R3. 90 100 120", h = TRUE)
  1. <details>
  2. <summary>英文:</summary>
  3. One base R option would be

df[-1] - df[-ncol(df)] # thanks @user20650

  1. Output
  1. X2 X3

R1. 10 30
R2. 15 20
R3. 10 20

  1. Or use `sapply`:

sapply(seq_len(ncol(df))[-1], function(x) df[,x] - df[,x-1])

  1. Output:
  1. [,1] [,2]

[1,] 10 30
[2,] 15 20
[3,] 10 20

  1. Data

df <- read.table(text = "1 2 3
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120", h = TRUE)

  1. </details>
  2. # 答案2
  3. **得分**: 2
  4. `lag`的`diff`:
  5. ```R
  6. df <- read.table(header = T, text = " 1 2 3
  7. R1. 50 60 90
  8. R2. 80 95 115
  9. R3. 90 100 120")
  10. matrix(diff(unlist(df), lag=nrow(df)), nrow=nrow(df))
英文:

diff with a lag:

  1. df &lt;- read.table(header = T, text = &quot; 1 2 3
  2. R1. 50 60 90
  3. R2. 80 95 115
  4. R3. 90 100 120&quot;)
  5. matrix(diff(unlist(df), lag=nrow(df)), nrow=nrow(df))

答案3

得分: 1

另一个完全矢量化的选项:

  1. data.frame(t(diff(t(df))))

你可以使用 apply + diff

  1. df <- read.table(header = T, text = " 1 2 3
  2. R1. 50 60 90
  3. R2. 80 95 115
  4. R3. 90 100 120")
  5. t(apply(df, 1, diff))
  6. # X2 X3
  7. #R1. 10 30
  8. #R2. 15 20
  9. #R3. 10 20
英文:

Another fully vectorized option:

  1. data.frame(t(diff(t(df))))

You can use apply + diff:

  1. df &lt;- read.table(header = T, text = &quot; 1 2 3
  2. R1. 50 60 90
  3. R2. 80 95 115
  4. R3. 90 100 120&quot;)
  5. t(apply(df, 1, diff))
  6. # X2 X3
  7. #R1. 10 30
  8. #R2. 15 20
  9. #R3. 10 20

答案4

得分: 0

以下是翻译好的部分:

"An efficient solution using data.table and collapse as long as your data doesn't have NA values.

Here we make a copy of the original data (except for the 1st col) and modify the columns by reference.

  1. library(data.table)
  2. library(collapse)
  3. # Take a copy
  4. out <- copy(fselect(df, -1))
  5. # Subtract lagged columns by reference
  6. for (i in seq_col(out)){
  7. out[[i]] %-=% df[[i]]
  8. }
  9. out
  10. #> X2 X3
  11. #> R1. 10 30
  12. #> R2. 15 20
  13. #> R3. 10 20

Created on 2023-06-02 with reprex v2.0.2

Data

  1. df <- read.table(header = T, text = &quot; 1 2 3
  2. R1. 50 60 90
  3. R2. 80 95 115
  4. R3. 90 100 120&quot;)
  5. ```"
  6. <details>
  7. <summary>英文:</summary>
  8. An efficient solution using `data.table` and `collapse` as long as your data doesn&#39;t have `NA` values.
  9. Here we make a copy of the original data (except for the 1st col) and modify the columns by reference.
  10. ``` r
  11. library(data.table)
  12. library(collapse)
  13. # Take a copy
  14. out &lt;- copy(fselect(df, -1))
  15. # Subtract lagged columns by reference
  16. for (i in seq_col(out)){
  17. out[[i]] %-=% df[[i]]
  18. }
  19. out
  20. #&gt; X2 X3
  21. #&gt; R1. 10 30
  22. #&gt; R2. 15 20
  23. #&gt; R3. 10 20

<sup>Created on 2023-06-02 with reprex v2.0.2</sup>

Data

  1. df &lt;- read.table(header = T, text = &quot; 1 2 3
  2. R1. 50 60 90
  3. R2. 80 95 115
  4. R3. 90 100 120&quot;)

huangapple
  • 本文由 发表于 2023年6月1日 22:12:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76382841.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定