英文:
How to multiply each column in a data frame by a different value per column
问题
考虑以下数据框:
x y z
1 0 0 0
2 1 0 0
3 0 1 0
4 1 1 0
5 0 0 1
6 1 0 1
7 0 1 1
8 1 1 1
我想要将每列分别与值相乘,例如 c(4, 2, 1),得到:
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
以下是无需使用for循环的矢量化解决方案(基于基本R):
pw2 <- c(4, 2, 1)
df <- df * pw2
df
英文:
Consider the following data frame
x y z
1 0 0 0
2 1 0 0
3 0 1 0
4 1 1 0
5 0 0 1
6 1 0 1
7 0 1 1
8 1 1 1
-------
x 4 2 1 <--- vector to multiply by
I would like to multiply each column by a seperate value, for example c(4,2,1).
Giving:
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
Code:
pw2 <- c(4, 2, 1)
s01 <- seq_len(2) - 1
df <- expand.grid(x=s01, y=s01, z=s01)
df
for (d in seq_len(3)) df[,d] <- df[,d] * pw2[d]
df
Question: Find a vectorized solution without a for loop (in base R).
Note:
that the question https://stackoverflow.com/questions/36111444/multiply-columns-in-a-data-frame-by-a-vector is ambiguous because it includes:
- multiply each row in the data frame column by a different value.
- multiply each column in the data frame by a different value.
Both queries can be easily solved with a for loop. Here a vectorised solution is explicitly requested.
答案1
得分: 11
使用sweep
函数来在数据框的边缘应用一个函数:
sweep(df, 2, pw2, `*`)
或者使用col
:
df * pw2[col(df)]
输出:
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
对于大型数据框,可以检查collapse::TRA
,它比其他答案快10倍(请参见基准测试):
collapse::TRA(df, pw2, ""*"")
基准测试:
bench::mark(sweep = sweep(df, 2, pw2, `*`),
col = df * pw2[col(df)],
'%*%' = setNames(
as.data.frame(as.matrix(df) %*% diag(pw2)),
names(df)
),
TRA = collapse::TRA(df, pw2, ""*""),
mapply = data.frame(mapply(FUN = `*`, df, pw2)),
apply = t(apply(df, 1, \(x) x*pw2)),
t = t(t(df)*pw2), check = FALSE,
)
英文:
Use sweep
to apply a function on margins of a dataframe:
sweep(df, 2, pw2, `*`)
or with col
:
df * pw2[col(df)]
output
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
For large data frames, check collapse::TRA
, which is 10x faster than any other answers (see benchmark):
collapse::TRA(df, pw2, "*")
Benchmark:
bench::mark(sweep = sweep(df, 2, pw2, `*`),
col = df * pw2[col(df)],
'%*%' = setNames(
as.data.frame(as.matrix(df) %*% diag(pw2)),
names(df)
),
TRA = collapse::TRA(df, pw2, "*"),
mapply = data.frame(mapply(FUN = `*`, df, pw2)),
apply = t(apply(df, 1, \(x) x*pw2)),
t = t(t(df)*pw2), check = FALSE,
)
# A tibble: 7 × 13
expression min median itr/s…¹ mem_al…² gc/se…³ n_itr n_gc total…⁴
<bch:expr> <bch:tm> <bch:t> <dbl> <bch:by> <dbl> <int> <dbl> <bch:t>
1 sweep 346.7µs 382.1µs 2427. 1.23KB 10.6 1141 5 470.2ms
2 col 303.1µs 330.4µs 2760. 784B 8.45 1307 4 473.5ms
3 %*% 72.8µs 77.9µs 11861. 480B 10.6 5599 5 472.1ms
4 TRA 5µs 5.5µs 167050. 0B 16.7 9999 1 59.9ms
5 mapply 117.6µs 127.9µs 7309. 480B 10.6 3442 5 470.9ms
6 apply 107.8µs 117.9µs 7887. 6.49KB 12.9 3658 6 463.8ms
7 t 55.3µs 59.7µs 15238. 720B 8.13 5620 3 368.8ms
答案2
得分: 9
转换 df
和 pw2
成矩阵,使用 %*%
矩阵乘法运算符,然后转换回数据框。这将去除列名,因此用 setNames()
包装以保留它们。
setNames(
as.data.frame(as.matrix(df) %*% diag(pw2)),
names(df)
)
英文:
Convert df
and pw2
to matrices, use the %*%
matrix multiplication operator, then convert back to a dataframe. This will strip the column names, so wrap in setNames()
to preserve them.
setNames(
as.data.frame(as.matrix(df) %*% diag(pw2)),
names(df)
)
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
答案3
得分: 6
使用mapply()
:
mapply(FUN = `*`, df, pw2)
作为数据框:
data.frame(mapply(FUN = `*`, df, pw2))
英文:
using mapply()
:
mapply(FUN = `*`, df, pw2)
x y z
[1,] 0 0 0
[2,] 4 0 0
[3,] 0 2 0
[4,] 4 2 0
[5,] 0 0 1
[6,] 4 0 1
[7,] 0 2 1
[8,] 4 2 1
and as data frame:
data.frame(mapply(FUN = `*`, df, pw2))
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
答案4
得分: 6
另一种选择是使用 `apply` 与类似的转置:
``` r
pw2 <- c(4, 2, 1)
t(apply(df, 1, \(x) x*pw2))
#> x y z
#> 1 0 0 0
#> 2 4 0 0
#> 3 0 2 0
#> 4 4 2 0
#> 5 0 0 1
#> 6 4 0 1
#> 7 0 2 1
#> 8 4 2 1
<details>
<summary>英文:</summary>
Another option using `apply` with transpose like this:
``` r
pw2 <- c(4, 2, 1)
t(apply(df, 1, \(x) x*pw2))
#> x y z
#> 1 0 0 0
#> 2 4 0 0
#> 3 0 2 0
#> 4 4 2 0
#> 5 0 0 1
#> 6 4 0 1
#> 7 0 2 1
#> 8 4 2 1
<sup>Created on 2023-04-10 with reprex v2.0.2</sup>
答案5
得分: 5
这是另一种选项,您将向量转换为与您的数据框具有相同维度的矩阵,然后简单地将两者相乘:
t(replicate(nrow(df), pw2)) * df
**输出**
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
<details>
<summary>英文:</summary>
Here is another option where you turn the vector into a matrix the same dimensions as your data frame and then simply multiply the two:
t(replicate(nrow(df), pw2)) * df
**Output**
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
</details>
# 答案6
**得分**: 5
以下是您要翻译的内容:
现有的 `mapply` 方法在所有答案中看起来不错,但我相信如果我们改用 `Map` + `list2DF`(特别是当您喜欢保持使用基本的 R 时),我们可以实现更高效的方法。
下面是 `mapply` 和 `Map` 变种的性能基准测试:
```R
microbenchmark(
"mapply1" = data.frame(mapply(FUN = `*`, df, pw2)),
"mapply2" = as.data.frame(mapply(FUN = `*`, df, pw2)),
"Map1" = list2DF(Map(`*`, df, pw2)),
"Map2" = list2DF(Map(`*`, df, as.list(pw2)))
)
结果如下:
Unit: microseconds
expr min lq mean median uq max neval
mapply1 74.6 78.60 112.163 97.05 140.50 342.6 100
mapply2 34.6 38.20 55.513 42.70 67.40 313.5 100
Map1 23.8 25.25 33.728 27.60 41.30 113.8 100
Map2 25.9 28.75 40.866 32.95 47.65 238.6 100
另外,让 Map
方法也参加基准测试,由 @Maël 提供,例如:
bc <- bench::mark(
sweep = sweep(df, 2, pw2, `*`),
col = df * pw2[col(df)],
"%*%" = setNames(
as.data.frame(as.matrix(df) %*% diag(pw2)),
names(df)
),
TRA = collapse::TRA(df, pw2, "*"),
mapply1 = data.frame(mapply(FUN = `*`, df, pw2)),
mapply2 = as.data.frame(mapply(FUN = `*`, df, pw2)),
Map1 = list2DF(Map(`*`, df, pw2)),
Map2 = list2DF(Map(`*`, df, as.list(pw2))),
apply = t(apply(df, 1, \(x) x * pw2)),
t = t(t(df) * pw2),
check = FALSE,
)
我们可以看到,Map
在效率方面排名第二:
# A tibble: 10 × 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
1 sweep 201.7μs 249.2μs 3526. 101.24KB 12.6 1680 6
2 col 174.9μs 225.6μs 3637. 9.02KB 10.4 1748 5
3 %*% 45.4μs 52.9μs 17026. 36.95KB 12.5 8158 6
4 TRA 3.4μs 3.8μs 226089. 905.09KB 22.6 9999 1
5 mapply1 71.6μs 78.4μs 11958. 480B 14.7 5681 7
6 mapply2 33.1μs 37.4μs 25339. 480B 17.7 9993 7
7 Map1 22.5μs 26.1μs 35649. 0B 17.8 9995 5
8 Map2 25.3μs 29.4μs 31785. 0B 19.1 9994 6
9 apply 70.2μs 80.7μs 11684. 11.91KB 14.7 5562 7
10 t 34.8μs 40.2μs 23608. 3.77KB 14.2 9994 6
autoplot(bc)
显示如下:
英文:
The existing mapply
approach among all answers look great but I believe we can achieve more efficiency if we use Map
+ list2DF
instead (specially when you prefer to stay with base R)
Below is a benchmark for mapply
and Map
variants
microbenchmark(
"mapply1" = data.frame(mapply(FUN = `*`, df, pw2)),
"mapply2" = as.data.frame(mapply(FUN = `*`, df, pw2)),
"Map1" = list2DF(Map(`*`, df, pw2)),
"Map2" = list2DF(Map(`*`, df, as.list(pw2)))
)
gives
Unit: microseconds
expr min lq mean median uq max neval
mapply1 74.6 78.60 112.163 97.05 140.50 342.6 100
mapply2 34.6 38.20 55.513 42.70 67.40 313.5 100
Map1 23.8 25.25 33.728 27.60 41.30 113.8 100
Map2 25.9 28.75 40.866 32.95 47.65 238.6 100
Also, let the Map
approach join the benchmarking party as provided by @Maël, e.g.,
bc <- bench::mark(
sweep = sweep(df, 2, pw2, `*`),
col = df * pw2[col(df)],
"%*%" = setNames(
as.data.frame(as.matrix(df) %*% diag(pw2)),
names(df)
),
TRA = collapse::TRA(df, pw2, "*"),
mapply1 = data.frame(mapply(FUN = `*`, df, pw2)),
mapply2 = as.data.frame(mapply(FUN = `*`, df, pw2)),
Map1 = list2DF(Map(`*`, df, pw2)),
Map2 = list2DF(Map(`*`, df, as.list(pw2))),
apply = t(apply(df, 1, \(x) x * pw2)),
t = t(t(df) * pw2),
check = FALSE,
)
we will see that Map
is in the second place in terms of efficiency
# A tibble: 10 × 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
1 sweep 201.7µs 249.2µs 3526. 101.24KB 12.6 1680 6
2 col 174.9µs 225.6µs 3637. 9.02KB 10.4 1748 5
3 %*% 45.4µs 52.9µs 17026. 36.95KB 12.5 8158 6
4 TRA 3.4µs 3.8µs 226089. 905.09KB 22.6 9999 1
5 mapply1 71.6µs 78.4µs 11958. 480B 14.7 5681 7
6 mapply2 33.1µs 37.4µs 25339. 480B 17.7 9993 7
7 Map1 22.5µs 26.1µs 35649. 0B 17.8 9995 5
8 Map2 25.3µs 29.4µs 31785. 0B 19.1 9994 6
9 apply 70.2µs 80.7µs 11684. 11.91KB 14.7 5562 7
10 t 34.8µs 40.2µs 23608. 3.77KB 14.2 9994 6
# ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
# time <list>, gc <list>
and autoplot(bc)
shows
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论