2023年4月10日 20:07:15go评论122阅读模式

英文:

How to multiply each column in a data frame by a different value per column

问题

考虑以下数据框：

我想要将每列分别与值相乘，例如 c(4, 2, 1)，得到：

以下是无需使用for循环的矢量化解决方案（基于基本R）：

pw2 <- c(4, 2, 1)
df <- df * pw2
df

英文:

Consider the following data frame

   x y z
 1 0 0 0
 2 1 0 0
 3 0 1 0
 4 1 1 0
 5 0 0 1
 6 1 0 1
 7 0 1 1
 8 1 1 1
 -------
 x 4 2 1  &lt;--- vector to multiply by

I would like to multiply each column by a seperate value, for example c(4,2,1).
Giving:

Code:

pw2 &lt;- c(4, 2, 1)
s01  &lt;- seq_len(2) - 1
df  &lt;- expand.grid(x=s01, y=s01, z=s01)
df

for (d in seq_len(3)) df[,d] &lt;- df[,d] * pw2[d]
df

Question: Find a vectorized solution without a for loop (in base R).

Note:
that the question https://stackoverflow.com/questions/36111444/multiply-columns-in-a-data-frame-by-a-vector is ambiguous because it includes:

multiply each row in the data frame column by a different value.
multiply each column in the data frame by a different value.

Both queries can be easily solved with a for loop. Here a vectorised solution is explicitly requested.

答案1

得分: 11

使用sweep函数来在数据框的边缘应用一个函数：

sweep(df, 2, pw2, `*`)

或者使用col：

df * pw2[col(df)]

输出：

对于大型数据框，可以检查collapse::TRA，它比其他答案快10倍（请参见基准测试）：

collapse::TRA(df, pw2, "&quot;*&quot;")

基准测试：

bench::mark(sweep = sweep(df, 2, pw2, `*`),
            col = df * pw2[col(df)],
            &#39;%*%&#39; = setNames(
              as.data.frame(as.matrix(df) %*% diag(pw2)), 
              names(df)
            ), 
            TRA = collapse::TRA(df, pw2, "&quot;*&quot;"), 
            mapply = data.frame(mapply(FUN = `*`, df, pw2)),
            apply = t(apply(df, 1, \(x) x*pw2)), 
            t = t(t(df)*pw2), check = FALSE,
            )

英文:

Use sweep to apply a function on margins of a dataframe:

sweep(df, 2, pw2, `*`)

or with col:

df * pw2[col(df)]

output

For large data frames, check collapse::TRA, which is 10x faster than any other answers (see benchmark):

collapse::TRA(df, pw2, &quot;*&quot;)

Benchmark:

bench::mark(sweep = sweep(df, 2, pw2, `*`),
            col = df * pw2[col(df)],
            &#39;%*%&#39; = setNames(
              as.data.frame(as.matrix(df) %*% diag(pw2)), 
              names(df)
            ), 
            TRA = collapse::TRA(df, pw2, &quot;*&quot;), 
            mapply = data.frame(mapply(FUN = `*`, df, pw2)),
            apply = t(apply(df, 1, \(x) x*pw2)), 
            t = t(t(df)*pw2), check = FALSE,
            )

# A tibble: 7 &#215; 13
  expression      min  median itr/s…&#185; mem_al…&#178; gc/se…&#179; n_itr  n_gc total…⁴
  &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:t&gt;   &lt;dbl&gt; &lt;bch:by&gt;   &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;bch:t&gt;
1 sweep       346.7&#181;s 382.1&#181;s   2427.   1.23KB   10.6   1141     5 470.2ms
2 col         303.1&#181;s 330.4&#181;s   2760.     784B    8.45  1307     4 473.5ms
3 %*%          72.8&#181;s  77.9&#181;s  11861.     480B   10.6   5599     5 472.1ms
4 TRA             5&#181;s   5.5&#181;s 167050.       0B   16.7   9999     1  59.9ms
5 mapply      117.6&#181;s 127.9&#181;s   7309.     480B   10.6   3442     5 470.9ms
6 apply       107.8&#181;s 117.9&#181;s   7887.   6.49KB   12.9   3658     6 463.8ms
7 t            55.3&#181;s  59.7&#181;s  15238.     720B    8.13  5620     3 368.8ms

答案2

得分: 9

转换 df 和 pw2 成矩阵，使用 %*% 矩阵乘法运算符，然后转换回数据框。这将去除列名，因此用 setNames() 包装以保留它们。

setNames(
  as.data.frame(as.matrix(df) %*% diag(pw2)), 
  names(df)
)

英文:

Convert df and pw2 to matrices, use the %*% matrix multiplication operator, then convert back to a dataframe. This will strip the column names, so wrap in setNames() to preserve them.

setNames(
  as.data.frame(as.matrix(df) %*% diag(pw2)), 
  names(df)
)

答案3

得分: 6

使用mapply()：

mapply(FUN = `*`, df, pw2)

作为数据框：

data.frame(mapply(FUN = `*`, df, pw2))

英文:

using mapply():

mapply(FUN = `*`, df, pw2)

     x y z
[1,] 0 0 0
[2,] 4 0 0
[3,] 0 2 0
[4,] 4 2 0
[5,] 0 0 1
[6,] 4 0 1
[7,] 0 2 1
[8,] 4 2 1

and as data frame:

data.frame(mapply(FUN = `*`, df, pw2))
  x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1

答案4

得分: 6

另一种选择是使用 `apply` 与类似的转置：

``` r
pw2 &lt;- c(4, 2, 1)
t(apply(df, 1, \(x) x*pw2))
#&gt;   x y z
#&gt; 1 0 0 0
#&gt; 2 4 0 0
#&gt; 3 0 2 0
#&gt; 4 4 2 0
#&gt; 5 0 0 1
#&gt; 6 4 0 1
#&gt; 7 0 2 1
#&gt; 8 4 2 1


<details>
<summary>英文:</summary>

Another option using `apply` with transpose like this:

``` r
pw2 &lt;- c(4, 2, 1)
t(apply(df, 1, \(x) x*pw2))
#&gt;   x y z
#&gt; 1 0 0 0
#&gt; 2 4 0 0
#&gt; 3 0 2 0
#&gt; 4 4 2 0
#&gt; 5 0 0 1
#&gt; 6 4 0 1
#&gt; 7 0 2 1
#&gt; 8 4 2 1

<sup>Created on 2023-04-10 with reprex v2.0.2</sup>

答案5

得分: 5

这是另一种选项，您将向量转换为与您的数据框具有相同维度的矩阵，然后简单地将两者相乘：

t(replicate(nrow(df), pw2)) * df


**输出**

x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1


<details>
<summary>英文:</summary>

Here is another option where you turn the vector into a matrix the same dimensions as your data frame and then simply multiply the two:

t(replicate(nrow(df), pw2)) * df


**Output**

x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1


</details>



# 答案6
**得分**: 5

以下是您要翻译的内容：

现有的 `mapply` 方法在所有答案中看起来不错，但我相信如果我们改用 `Map` + `list2DF`（特别是当您喜欢保持使用基本的 R 时），我们可以实现更高效的方法。

下面是 `mapply` 和 `Map` 变种的性能基准测试：

```R
microbenchmark(
  "mapply1" = data.frame(mapply(FUN = `*`, df, pw2)),
  "mapply2" = as.data.frame(mapply(FUN = `*`, df, pw2)),
  "Map1" = list2DF(Map(`*`, df, pw2)),
  "Map2" = list2DF(Map(`*`, df, as.list(pw2)))
)

结果如下：

Unit: microseconds
    expr  min    lq    mean median     uq   max neval
 mapply1 74.6 78.60 112.163  97.05 140.50 342.6   100
 mapply2 34.6 38.20  55.513  42.70  67.40 313.5   100
    Map1 23.8 25.25  33.728  27.60  41.30 113.8   100
    Map2 25.9 28.75  40.866  32.95  47.65 238.6   100

另外，让 Map 方法也参加基准测试，由 @Maël 提供，例如：

bc <- bench::mark(
  sweep = sweep(df, 2, pw2, `*`),
  col = df * pw2[col(df)],
  "%*%" = setNames(
    as.data.frame(as.matrix(df) %*% diag(pw2)),
    names(df)
  ),
  TRA = collapse::TRA(df, pw2, "*"),
  mapply1 = data.frame(mapply(FUN = `*`, df, pw2)),
  mapply2 = as.data.frame(mapply(FUN = `*`, df, pw2)),
  Map1 = list2DF(Map(`*`, df, pw2)),
  Map2 = list2DF(Map(`*`, df, as.list(pw2))),
  apply = t(apply(df, 1, \(x) x * pw2)),
  t = t(t(df) * pw2),
  check = FALSE,
)

我们可以看到，Map 在效率方面排名第二：

# A tibble: 10 × 13
   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
 1 sweep       201.7μs  249.2μs     3526.  101.24KB     12.6  1680     6
 2 col         174.9μs  225.6μs     3637.    9.02KB     10.4  1748     5
 3 %*%          45.4μs   52.9μs    17026.   36.95KB     12.5  8158     6
 4 TRA           3.4μs    3.8μs   226089.  905.09KB     22.6  9999     1
 5 mapply1      71.6μs   78.4μs    11958.      480B     14.7  5681     7
 6 mapply2      33.1μs   37.4μs    25339.      480B     17.7  9993     7
 7 Map1         22.5μs   26.1μs    35649.        0B     17.8  9995     5
 8 Map2         25.3μs   29.4μs    31785.        0B     19.1  9994     6
 9 apply        70.2μs   80.7μs    11684.   11.91KB     14.7  5562     7
10 t            34.8μs   40.2μs    23608.    3.77KB     14.2  9994     6

autoplot(bc) 显示如下：

英文:

The existing mapply approach among all answers look great but I believe we can achieve more efficiency if we use Map + list2DF instead (specially when you prefer to stay with base R)

Below is a benchmark for mapply and Map variants

microbenchmark(
  &quot;mapply1&quot; = data.frame(mapply(FUN = `*`, df, pw2)),
  &quot;mapply2&quot; = as.data.frame(mapply(FUN = `*`, df, pw2)),
  &quot;Map1&quot; = list2DF(Map(`*`, df, pw2)),
  &quot;Map2&quot; = list2DF(Map(`*`, df, as.list(pw2)))
)

gives

Unit: microseconds
    expr  min    lq    mean median     uq   max neval
 mapply1 74.6 78.60 112.163  97.05 140.50 342.6   100
 mapply2 34.6 38.20  55.513  42.70  67.40 313.5   100
    Map1 23.8 25.25  33.728  27.60  41.30 113.8   100
    Map2 25.9 28.75  40.866  32.95  47.65 238.6   100

Also, let the Map approach join the benchmarking party as provided by @Maël, e.g.,

bc &lt;- bench::mark(
  sweep = sweep(df, 2, pw2, `*`),
  col = df * pw2[col(df)],
  &quot;%*%&quot; = setNames(
    as.data.frame(as.matrix(df) %*% diag(pw2)),
    names(df)
  ),
  TRA = collapse::TRA(df, pw2, &quot;*&quot;),
  mapply1 = data.frame(mapply(FUN = `*`, df, pw2)),
  mapply2 = as.data.frame(mapply(FUN = `*`, df, pw2)),
  Map1 = list2DF(Map(`*`, df, pw2)),
  Map2 = list2DF(Map(`*`, df, as.list(pw2))),
  apply = t(apply(df, 1, \(x) x * pw2)),
  t = t(t(df) * pw2),
  check = FALSE,
)

we will see that Map is in the second place in terms of efficiency

# A tibble: 10 &#215; 13
   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
   &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt;     &lt;dbl&gt; &lt;bch:byt&gt;    &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;
 1 sweep       201.7&#181;s  249.2&#181;s     3526.  101.24KB     12.6  1680     6
 2 col         174.9&#181;s  225.6&#181;s     3637.    9.02KB     10.4  1748     5
 3 %*%          45.4&#181;s   52.9&#181;s    17026.   36.95KB     12.5  8158     6
 4 TRA           3.4&#181;s    3.8&#181;s   226089.  905.09KB     22.6  9999     1
 5 mapply1      71.6&#181;s   78.4&#181;s    11958.      480B     14.7  5681     7
 6 mapply2      33.1&#181;s   37.4&#181;s    25339.      480B     17.7  9993     7
 7 Map1         22.5&#181;s   26.1&#181;s    35649.        0B     17.8  9995     5
 8 Map2         25.3&#181;s   29.4&#181;s    31785.        0B     19.1  9994     6
 9 apply        70.2&#181;s   80.7&#181;s    11684.   11.91KB     14.7  5562     7
10 t            34.8&#181;s   40.2&#181;s    23608.    3.77KB     14.2  9994     6
# ℹ 5 more variables: total_time &lt;bch:tm&gt;, result &lt;list&gt;, memory &lt;list&gt;,
#   time &lt;list&gt;, gc &lt;list&gt;

and autoplot(bc) shows

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将数据框中的每一列分别乘以不同的值每列。

问题

答案1

答案2

答案3

答案4

答案5

在Pandas系列中对列执行的函数。

重塑数据框中的字符串在 R 中

httr将工作中的Python连接翻译为R。

Kafka KStream到KStream的连接 | 重新启动性能

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论