我如何在R中高效计算一列值与前一列的差异?

huangapple go评论76阅读模式
英文:

How can I efficiently calculate differences between values of a column and the preceding column in R?

问题

我正在尝试计算每行列值与前一列值之间的差异,例如:

1 2 3
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120

我想要代码执行以下操作:
(第2列中的值 - 第1列中的值),(第3列中的值 - 第2列中的值),(第4列中的值 - 第3列中的值),等等...

以便将计算出的值存储在一个新表格中,如下所示:

1 2
R1. 10 30
R2. 15 20
R3. 10 20

我一直在一个小列集上手动执行此操作,但是否有代码可以更高效地在较大的列/行集上执行此操作?

# 计算差异
diff_new = tab_new[,2] - tab_new[,1]
diff_new2 = tab_new[,3] - tab_new[,2]
diff_new3 = tab_new[,4] - tab_new[,3]
diff_new4 = tab_new[,5] - tab_new[,4]

# 创建包含差异的新表格
diff.table_new = cbind(diff_new, diff_new2, diff_new3, diff_new4)

如果需要在较大的列/行集上执行相同的操作,你可以使用循环或函数来自动化这个过程,以避免手动指定每一列。

英文:

I'm trying to calculate the differences between values of a column and the values from a preceding column for each row, for example:

    1   2   3   
R1. 50 60. 90
R2. 80.95. 115
R3. 90 100 120

I would want the code to do the following:
(values in col 2 - values in col 1), (values in col 3 - values in col 2), (values in col 4 - values in col 3), etc..

for an output that would store the calculated values in a new table like this:

    1   2      
R1. 10 30. 
R2. 10.20 
R3. 10 20

I've been doing this manually on a small set of columns as follows, but is there a code to more efficiently do this on a larger set of columns/rows

#to calculate differences
diff_new = tab_new[,2] - tab_new[,1]
diff_new2 = tab_new[,3] - tab_new[,2]
diff_new3 = tab_new[,4] - tab_new[,3]
diff_new4 = tab_new[,5] - tab_new[,4]

#to create a new table with differences
diff.table_new = cbind(diff_new,diff_new2,diff_new3,diff_new4)

答案1

得分: 3

一个基本的R选项是:

```R
df[-1] - df[-ncol(df)] # 感谢 @user20650

输出:

    X2 X3
R1. 10 30
R2. 15 20
R3. 10 20

或者使用sapply

sapply(seq_len(ncol(df))[-1], function(x) df[,x] - df[,x-1]) 

输出:

     [,1] [,2]
[1,]   10   30
[2,]   15   20
[3,]   10   20

数据:

df <- read.table(text = "1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120", h = TRUE)

<details>
<summary>英文:</summary>

One base R option would be 

df[-1] - df[-ncol(df)] # thanks @user20650

Output
X2 X3

R1. 10 30
R2. 15 20
R3. 10 20

Or use `sapply`:

sapply(seq_len(ncol(df))[-1], function(x) df[,x] - df[,x-1])

Output:
 [,1] [,2]

[1,] 10 30
[2,] 15 20
[3,] 10 20

Data

df <- read.table(text = "1 2 3
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120", h = TRUE)


</details>



# 答案2
**得分**: 2

`lag`的`diff`:
```R
df <- read.table(header = T, text = "    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120")

matrix(diff(unlist(df), lag=nrow(df)), nrow=nrow(df))
英文:

diff with a lag:

df &lt;- read.table(header = T, text = &quot;    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120&quot;)

matrix(diff(unlist(df), lag=nrow(df)), nrow=nrow(df))

答案3

得分: 1

另一个完全矢量化的选项:

data.frame(t(diff(t(df))))

你可以使用 apply + diff

df <- read.table(header = T, text = "    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120")

t(apply(df, 1, diff))

#    X2 X3
#R1. 10 30
#R2. 15 20
#R3. 10 20
英文:

Another fully vectorized option:

data.frame(t(diff(t(df))))

You can use apply + diff:

df &lt;- read.table(header = T, text = &quot;    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120&quot;)

t(apply(df, 1, diff))

#    X2 X3
#R1. 10 30
#R2. 15 20
#R3. 10 20

答案4

得分: 0

以下是翻译好的部分:

"An efficient solution using data.table and collapse as long as your data doesn't have NA values.

Here we make a copy of the original data (except for the 1st col) and modify the columns by reference.

library(data.table)
library(collapse)
# Take a copy
out <- copy(fselect(df, -1))

# Subtract lagged columns by reference
for (i in seq_col(out)){
  out[[i]] %-=% df[[i]]
}
out
#>     X2 X3
#> R1. 10 30
#> R2. 15 20
#> R3. 10 20

Created on 2023-06-02 with reprex v2.0.2

Data

df <- read.table(header = T, text = &quot;    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120&quot;)
```"

<details>
<summary>英文:</summary>

An efficient solution using `data.table` and `collapse` as long as your data doesn&#39;t have `NA` values.

Here we make a copy of the original data (except for the 1st col) and modify the columns by reference.

``` r
library(data.table)
library(collapse)
# Take a copy
out &lt;- copy(fselect(df, -1))

# Subtract lagged columns by reference
for (i in seq_col(out)){
  out[[i]] %-=% df[[i]]
}
out
#&gt;     X2 X3
#&gt; R1. 10 30
#&gt; R2. 15 20
#&gt; R3. 10 20

<sup>Created on 2023-06-02 with reprex v2.0.2</sup>

Data

df &lt;- read.table(header = T, text = &quot;    1   2   3   
R1. 50 60 90
R2. 80 95 115
R3. 90 100 120&quot;)

huangapple
  • 本文由 发表于 2023年6月1日 22:12:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76382841.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定