如何将数据框中的每一列分别乘以不同的值每列。

huangapple go评论106阅读模式
英文:

How to multiply each column in a data frame by a different value per column

问题

考虑以下数据框:

       x y z
     1 0 0 0
     2 1 0 0
     3 0 1 0
     4 1 1 0
     5 0 0 1
     6 1 0 1
     7 0 1 1
     8 1 1 1

我想要将每列分别与值相乘,例如 c(4, 2, 1),得到:

       x y z
     1 0 0 0
     2 4 0 0
     3 0 2 0
     4 4 2 0
     5 0 0 1
     6 4 0 1
     7 0 2 1
     8 4 2 1

以下是无需使用for循环的矢量化解决方案(基于基本R):

pw2 <- c(4, 2, 1)
df <- df * pw2
df
英文:

Consider the following data frame

   x y z
 1 0 0 0
 2 1 0 0
 3 0 1 0
 4 1 1 0
 5 0 0 1
 6 1 0 1
 7 0 1 1
 8 1 1 1
 -------
 x 4 2 1  &lt;--- vector to multiply by 

I would like to multiply each column by a seperate value, for example c(4,2,1).
Giving:

   x y z
 1 0 0 0
 2 4 0 0
 3 0 2 0
 4 4 2 0
 5 0 0 1
 6 4 0 1
 7 0 2 1
 8 4 2 1

Code:

pw2 &lt;- c(4, 2, 1)
s01  &lt;- seq_len(2) - 1
df  &lt;- expand.grid(x=s01, y=s01, z=s01)
df

for (d in seq_len(3)) df[,d] &lt;- df[,d] * pw2[d]
df

Question: Find a vectorized solution without a for loop (in base R).

Note:
that the question https://stackoverflow.com/questions/36111444/multiply-columns-in-a-data-frame-by-a-vector is ambiguous because it includes:

  • multiply each row in the data frame column by a different value.
  • multiply each column in the data frame by a different value.

Both queries can be easily solved with a for loop. Here a vectorised solution is explicitly requested.

答案1

得分: 11

使用sweep函数来在数据框的边缘应用一个函数:

sweep(df, 2, pw2, `*`)

或者使用col

df * pw2[col(df)]

输出:

  x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1

对于大型数据框,可以检查collapse::TRA,它比其他答案快10倍(请参见基准测试):

collapse::TRA(df, pw2, "&quot;*&quot;")

基准测试:

bench::mark(sweep = sweep(df, 2, pw2, `*`),
            col = df * pw2[col(df)],
            &#39;%*%&#39; = setNames(
              as.data.frame(as.matrix(df) %*% diag(pw2)), 
              names(df)
            ), 
            TRA = collapse::TRA(df, pw2, "&quot;*&quot;"), 
            mapply = data.frame(mapply(FUN = `*`, df, pw2)),
            apply = t(apply(df, 1, \(x) x*pw2)), 
            t = t(t(df)*pw2), check = FALSE,
            )
英文:

Use sweep to apply a function on margins of a dataframe:

sweep(df, 2, pw2, `*`)

or with col:

df * pw2[col(df)]

output

  x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1

For large data frames, check collapse::TRA, which is 10x faster than any other answers (see benchmark):

collapse::TRA(df, pw2, &quot;*&quot;)

Benchmark:

bench::mark(sweep = sweep(df, 2, pw2, `*`),
            col = df * pw2[col(df)],
            &#39;%*%&#39; = setNames(
              as.data.frame(as.matrix(df) %*% diag(pw2)), 
              names(df)
            ), 
            TRA = collapse::TRA(df, pw2, &quot;*&quot;), 
            mapply = data.frame(mapply(FUN = `*`, df, pw2)),
            apply = t(apply(df, 1, \(x) x*pw2)), 
            t = t(t(df)*pw2), check = FALSE,
            )

# A tibble: 7 &#215; 13
  expression      min  median itr/s…&#185; mem_al…&#178; gc/se…&#179; n_itr  n_gc total…⁴
  &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:t&gt;   &lt;dbl&gt; &lt;bch:by&gt;   &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;bch:t&gt;
1 sweep       346.7&#181;s 382.1&#181;s   2427.   1.23KB   10.6   1141     5 470.2ms
2 col         303.1&#181;s 330.4&#181;s   2760.     784B    8.45  1307     4 473.5ms
3 %*%          72.8&#181;s  77.9&#181;s  11861.     480B   10.6   5599     5 472.1ms
4 TRA             5&#181;s   5.5&#181;s 167050.       0B   16.7   9999     1  59.9ms
5 mapply      117.6&#181;s 127.9&#181;s   7309.     480B   10.6   3442     5 470.9ms
6 apply       107.8&#181;s 117.9&#181;s   7887.   6.49KB   12.9   3658     6 463.8ms
7 t            55.3&#181;s  59.7&#181;s  15238.     720B    8.13  5620     3 368.8ms

答案2

得分: 9

转换 dfpw2 成矩阵,使用 %*% 矩阵乘法运算符,然后转换回数据框。这将去除列名,因此用 setNames() 包装以保留它们。

setNames(
  as.data.frame(as.matrix(df) %*% diag(pw2)), 
  names(df)
)
英文:

Convert df and pw2 to matrices, use the %*% matrix multiplication operator, then convert back to a dataframe. This will strip the column names, so wrap in setNames() to preserve them.

setNames(
  as.data.frame(as.matrix(df) %*% diag(pw2)), 
  names(df)
)
  x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1

答案3

得分: 6

使用mapply()

mapply(FUN = `*`, df, pw2)

作为数据框:

data.frame(mapply(FUN = `*`, df, pw2))
英文:

using mapply():

mapply(FUN = `*`, df, pw2)

     x y z
[1,] 0 0 0
[2,] 4 0 0
[3,] 0 2 0
[4,] 4 2 0
[5,] 0 0 1
[6,] 4 0 1
[7,] 0 2 1
[8,] 4 2 1

and as data frame:

data.frame(mapply(FUN = `*`, df, pw2))
  x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1

答案4

得分: 6

另一种选择是使用 `apply` 与类似的转置:

``` r
pw2 &lt;- c(4, 2, 1)
t(apply(df, 1, \(x) x*pw2))
#&gt;   x y z
#&gt; 1 0 0 0
#&gt; 2 4 0 0
#&gt; 3 0 2 0
#&gt; 4 4 2 0
#&gt; 5 0 0 1
#&gt; 6 4 0 1
#&gt; 7 0 2 1
#&gt; 8 4 2 1

<details>
<summary>英文:</summary>

Another option using `apply` with transpose like this:

``` r
pw2 &lt;- c(4, 2, 1)
t(apply(df, 1, \(x) x*pw2))
#&gt;   x y z
#&gt; 1 0 0 0
#&gt; 2 4 0 0
#&gt; 3 0 2 0
#&gt; 4 4 2 0
#&gt; 5 0 0 1
#&gt; 6 4 0 1
#&gt; 7 0 2 1
#&gt; 8 4 2 1

<sup>Created on 2023-04-10 with reprex v2.0.2</sup>

答案5

得分: 5

这是另一种选项,您将向量转换为与您的数据框具有相同维度的矩阵,然后简单地将两者相乘:

t(replicate(nrow(df), pw2)) * df


**输出**

x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1


<details>
<summary>英文:</summary>

Here is another option where you turn the vector into a matrix the same dimensions as your data frame and then simply multiply the two: 

t(replicate(nrow(df), pw2)) * df


**Output**

x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1


</details>



# 答案6
**得分**: 5

以下是您要翻译的内容:

现有的 `mapply` 方法在所有答案中看起来不错,但我相信如果我们改用 `Map` + `list2DF`(特别是当您喜欢保持使用基本的 R 时),我们可以实现更高效的方法。

下面是 `mapply` 和 `Map` 变种的性能基准测试:

```R
microbenchmark(
  "mapply1" = data.frame(mapply(FUN = `*`, df, pw2)),
  "mapply2" = as.data.frame(mapply(FUN = `*`, df, pw2)),
  "Map1" = list2DF(Map(`*`, df, pw2)),
  "Map2" = list2DF(Map(`*`, df, as.list(pw2)))
)

结果如下:

Unit: microseconds
    expr  min    lq    mean median     uq   max neval
 mapply1 74.6 78.60 112.163  97.05 140.50 342.6   100
 mapply2 34.6 38.20  55.513  42.70  67.40 313.5   100
    Map1 23.8 25.25  33.728  27.60  41.30 113.8   100
    Map2 25.9 28.75  40.866  32.95  47.65 238.6   100

另外,让 Map 方法也参加基准测试,由 @Maël 提供,例如:

bc <- bench::mark(
  sweep = sweep(df, 2, pw2, `*`),
  col = df * pw2[col(df)],
  "%*%" = setNames(
    as.data.frame(as.matrix(df) %*% diag(pw2)),
    names(df)
  ),
  TRA = collapse::TRA(df, pw2, "*"),
  mapply1 = data.frame(mapply(FUN = `*`, df, pw2)),
  mapply2 = as.data.frame(mapply(FUN = `*`, df, pw2)),
  Map1 = list2DF(Map(`*`, df, pw2)),
  Map2 = list2DF(Map(`*`, df, as.list(pw2))),
  apply = t(apply(df, 1, \(x) x * pw2)),
  t = t(t(df) * pw2),
  check = FALSE,
)

我们可以看到,Map 在效率方面排名第二:

# A tibble: 10 × 13
   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
 1 sweep       201.7μs  249.2μs     3526.  101.24KB     12.6  1680     6
 2 col         174.9μs  225.6μs     3637.    9.02KB     10.4  1748     5
 3 %*%          45.4μs   52.9μs    17026.   36.95KB     12.5  8158     6
 4 TRA           3.4μs    3.8μs   226089.  905.09KB     22.6  9999     1
 5 mapply1      71.6μs   78.4μs    11958.      480B     14.7  5681     7
 6 mapply2      33.1μs   37.4μs    25339.      480B     17.7  9993     7
 7 Map1         22.5μs   26.1μs    35649.        0B     17.8  9995     5
 8 Map2         25.3μs   29.4μs    31785.        0B     19.1  9994     6
 9 apply        70.2μs   80.7μs    11684.   11.91KB     14.7  5562     7
10 t            34.8μs   40.2μs    23608.    3.77KB     14.2  9994     6

autoplot(bc) 显示如下:

如何将数据框中的每一列分别乘以不同的值每列。

英文:

The existing mapply approach among all answers look great but I believe we can achieve more efficiency if we use Map + list2DF instead (specially when you prefer to stay with base R)


Below is a benchmark for mapply and Map variants

microbenchmark(
  &quot;mapply1&quot; = data.frame(mapply(FUN = `*`, df, pw2)),
  &quot;mapply2&quot; = as.data.frame(mapply(FUN = `*`, df, pw2)),
  &quot;Map1&quot; = list2DF(Map(`*`, df, pw2)),
  &quot;Map2&quot; = list2DF(Map(`*`, df, as.list(pw2)))
)

gives

Unit: microseconds
    expr  min    lq    mean median     uq   max neval
 mapply1 74.6 78.60 112.163  97.05 140.50 342.6   100
 mapply2 34.6 38.20  55.513  42.70  67.40 313.5   100
    Map1 23.8 25.25  33.728  27.60  41.30 113.8   100
    Map2 25.9 28.75  40.866  32.95  47.65 238.6   100

Also, let the Map approach join the benchmarking party as provided by @Maël, e.g.,

bc &lt;- bench::mark(
  sweep = sweep(df, 2, pw2, `*`),
  col = df * pw2[col(df)],
  &quot;%*%&quot; = setNames(
    as.data.frame(as.matrix(df) %*% diag(pw2)),
    names(df)
  ),
  TRA = collapse::TRA(df, pw2, &quot;*&quot;),
  mapply1 = data.frame(mapply(FUN = `*`, df, pw2)),
  mapply2 = as.data.frame(mapply(FUN = `*`, df, pw2)),
  Map1 = list2DF(Map(`*`, df, pw2)),
  Map2 = list2DF(Map(`*`, df, as.list(pw2))),
  apply = t(apply(df, 1, \(x) x * pw2)),
  t = t(t(df) * pw2),
  check = FALSE,
)

we will see that Map is in the second place in terms of efficiency

# A tibble: 10 &#215; 13
   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
   &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt;     &lt;dbl&gt; &lt;bch:byt&gt;    &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;
 1 sweep       201.7&#181;s  249.2&#181;s     3526.  101.24KB     12.6  1680     6
 2 col         174.9&#181;s  225.6&#181;s     3637.    9.02KB     10.4  1748     5
 3 %*%          45.4&#181;s   52.9&#181;s    17026.   36.95KB     12.5  8158     6
 4 TRA           3.4&#181;s    3.8&#181;s   226089.  905.09KB     22.6  9999     1
 5 mapply1      71.6&#181;s   78.4&#181;s    11958.      480B     14.7  5681     7
 6 mapply2      33.1&#181;s   37.4&#181;s    25339.      480B     17.7  9993     7
 7 Map1         22.5&#181;s   26.1&#181;s    35649.        0B     17.8  9995     5
 8 Map2         25.3&#181;s   29.4&#181;s    31785.        0B     19.1  9994     6
 9 apply        70.2&#181;s   80.7&#181;s    11684.   11.91KB     14.7  5562     7
10 t            34.8&#181;s   40.2&#181;s    23608.    3.77KB     14.2  9994     6
# ℹ 5 more variables: total_time &lt;bch:tm&gt;, result &lt;list&gt;, memory &lt;list&gt;,
#   time &lt;list&gt;, gc &lt;list&gt;

and autoplot(bc) shows

如何将数据框中的每一列分别乘以不同的值每列。

huangapple
  • 本文由 发表于 2023年4月10日 20:07:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75976974.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定