计算一组列的按行加权和

huangapple go评论64阅读模式
英文:

Calculate the row-wise weighted sum for a set of columns

问题

以下是代码的翻译部分:

我有如下的数据框:

    > library(tidyverse)
    > dd <- tibble(a = rep(1,10), b = rep(1,10), c = rep(1,10))
    > dd
    # A tibble: 10 × 3
           a     b     c
       <dbl> <dbl> <dbl>
     1     1     1     1
     2     1     1     1
     3     1     1     1
     4     1     1     1
     5     1     1     1
     6     1     1     1
     7     1     1     1
     8     1     1     1
     9     1     1     1
    10     1     1     1

以及一个权重向量:

    > weight <- c(1, 5, 10)
    > weight
    [1]  1  5 10

当我想要计算数据框的所有列的加权行和时,我执行以下操作:

    > dd %>% mutate(m = rowSums(map2_dfc(dd, weight,`*`)))
    # A tibble: 10 × 4
           a     b     c     m
       <dbl> <dbl> <dbl> <dbl>
     1     1     1     1    16
     2     1     1     1    16
     3     1     1     1    16
     4     1     1     1    16
     5     1     1     1    16
     6     1     1     1    16
     7     1     1     1    16
     8     1     1     1    16
     9     1     1     1    16
    10     1     1     1    16

但是我不知道如何计算数据框的**子集**的加权行和。我尝试了下面的代码,但结果混乱不堪:

    > dd %>% rowwise() %>% mutate(m = rowwise(map2_dfc(c_across(b:c), weight[2:3],`*`)))
    New names:
    • `` -> `...1`
    • `` -> `...2`
    New names:
    • `` -> `...1`
    • `` -> `...2`
    New names:
    • `` -> `...1`
    • `` -> `...2`
    New names:
    • `` -> `...1`
    • `` -> `...2`
    New names:
    • `` -> `...1`
    • `` -> `...2`
    New names:
    • `` -> `...1`
    • `` -> `...2`
    New names:
    • `` -> `...1`
    • `` -> `...2`
    New names:
    • `` -> `...1`
    • `` -> `...2`
    # A tibble: 10 × 4
    # Rowwise: 
           a     b     c m$...1 $...2
       <dbl> <dbl> <dbl>  <dbl>  <dbl>
     1     1     1     1      5    10
     2     1     1     1      5    10
     3     1     1     1      5    10
     4     1     1     1      5    10
     5     1     1     1      5    10
     6     1     1     1      5    10
     7     1     1     1      5    10
     8     1     1     1      5    10
     9     1     1     1      5    10
    10     1     1     1      5    10

请问有人可以给我一些关于如何解决这个问题的提示吗?
英文:

I have, say, the following data frame:

&gt; library(tidyverse)
&gt; dd &lt;- tibble(a = rep(1,10), b = rep(1,10), c = rep(1,10))
&gt; dd
# A tibble: 10 &#215; 3
a     b     c
&lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     1     1     1
2     1     1     1
3     1     1     1
4     1     1     1
5     1     1     1
6     1     1     1
7     1     1     1
8     1     1     1
9     1     1     1
10     1     1     1

and a vector of weights:

&gt; weight &lt;- c(1, 5, 10)
&gt; weight
[1]  1  5 10

when I want to calculate the row-wise weighted sum for all the columns of the dataframe together, I do this:

&gt; dd %&gt;% mutate(m = rowSums(map2_dfc(dd, weight,`*`)))
# A tibble: 10 &#215; 4
a     b     c     m
&lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     1     1     1    16
2     1     1     1    16
3     1     1     1    16
4     1     1     1    16
5     1     1     1    16
6     1     1     1    16
7     1     1     1    16
8     1     1     1    16
9     1     1     1    16
10     1     1     1    16

but I don't know how to calculate the row-wise weighted sum for a subset of the data frame. I tried the code below, but it gives messy results:

&gt; dd %&gt;% rowwise() %&gt;% mutate(m = rowwise(map2_dfc(c_across(b:c), weight[2:3],`*`)))
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
New names:
• `` -&gt; `...1`
• `` -&gt; `...2`
# A tibble: 10 &#215; 4
# Rowwise: 
a     b     c m$...1 $...2
&lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt;
1     1     1     1      5    10
2     1     1     1      5    10
3     1     1     1      5    10
4     1     1     1      5    10
5     1     1     1      5    10
6     1     1     1      5    10
7     1     1     1      5    10
8     1     1     1      5    10
9     1     1     1      5    10
10     1     1     1      5    10

Can someone please give me a hint as to how to approach this problem?

答案1

得分: 4

这是矩阵相乘。您的原始代码等同于 as.matrix(dd) %*% weight。在 mutate 内部的子集中,您可以这样做:

dd %>% mutate(m = (across(b:c) %>% as.matrix()) %*% weight[1:2])
英文:

This is matrix multiplication. Your original is equivalent to as.matrix(dd) %*% weight. For a subset inside mutate you can do this:

dd %&gt;% mutate(m = (across(b:c) %&gt;% as.matrix()) %*% weight[1:2])

答案2

得分: 2

使用 tidyverse 方法,我们可以创建一个命名向量用于 'weight',通过列 'b' 到 'c' 进行循环 across,根据列名(cur_column())来选择 'weight' 值,进行乘法并获得 rowSums

library(dplyr)
names(weight) <- names(dd)
dd %>% 
   mutate(m = rowSums(across(b:c,  ~ .x * weight[cur_column()])))

-output

# A tibble: 10 × 4
       a     b     c     m
   <dbl> <dbl> <dbl> <dbl>
 1     1     1     1    15
 2     1     1     1    15
 3     1     1     1    15
 4     1     1     1    15
 5     1     1     1    15
 6     1     1     1    15
 7     1     1     1    15
 8     1     1     1    15
 9     1     1     1    15
10     1     1     1    15

或者如果我们想要使用 rowwise(不推荐,因为它速度较慢)

dd %>% 
  rowwise %>% 
  mutate(m = sum(c_across(b:c) * weight[2:3])) %>% 
  ungroup

或者使用 crossprod

dd %>%
   mutate(m = crossprod(t(pick(b:c)), weight[2:3])[,1])

或者使用 base R

dd$m <-  rowSums(dd[2:3] * weight[2:3][col(dd[2:3])])
英文:

Using tidyverse methods, we can create a named vector for 'weight', loop across the columns 'b' to 'c', subset the 'weight' value based on the column name (cur_column()), multiply and get the rowSums

library(dplyr)
names(weight) &lt;- names(dd)
dd %&gt;% 
mutate(m = rowSums(across(b:c,  ~ .x * weight[cur_column()])))

-output

# A tibble: 10 &#215; 4
a     b     c     m
&lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     1     1     1    15
2     1     1     1    15
3     1     1     1    15
4     1     1     1    15
5     1     1     1    15
6     1     1     1    15
7     1     1     1    15
8     1     1     1    15
9     1     1     1    15
10     1     1     1    15

Or if we want to use rowwise (not recommended as it is slower)

dd %&gt;% 
rowwise %&gt;%
mutate(m = sum(c_across(b:c) * weight[2:3])) %&gt;%
ungroup

Or use crossprod

dd %&gt;%
mutate(m = crossprod(t(pick(b:c)), weight[2:3])[,1])

Or with base R

dd$m &lt;-  rowSums(dd[2:3] * weight[2:3][col(dd[2:3])])

huangapple
  • 本文由 发表于 2023年4月4日 10:13:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75925015.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定