2023年6月1日 23:37:47go评论98阅读模式

英文:

Perform a specific Mathematical Function on each column dynamically in R

问题

我想要在数据框中对每个唯一项目执行数学函数。

通常，要执行数学函数，我们使用 mutate 语句并创建一个列，然后在每个 mutate 语句之后手动执行数学函数。

这在几列上是可行的。但如果我有100列，并且需要执行2-5个数学函数，例如：一个是初始数字的增加20%，另一个是在每列上将初始数字除以2并保持原始列不变。

除了为每个特定项目编写 mutate 语句之外，R中是否有可能实现这一点？

我正在使用的数据框是：

structure(list(`Row Labels` = c("2023-03-01", "2023-04-01", "2023-05-01", 
"2023-06-01", "2023-07-01", "2023-08-01", "2023-09-01", "2023-10-01"
), X6 = c(14, 16, 14, 11, 9, 9, 11, 11), X7 = c(50, 50, 50, 50, 
50, 50, 50, 50), X8 = c(75, 75, 75, 75, 75, 75, 75, 75), X9 = c(100, 
100, 100, 100, 100, 100, 100, 100), X11 = c(25, 25, 50, 75, 125, 
200, 325, 525), X12 = c(50, 50, 100, 150, 250, 400, 650, 1050
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L))

对于单独的情况，以下代码就足够了：

library(readxl)
library(dplyr)
Book1 <- read_excel("C:/X/X/X- X/X/Book1.xlsx", sheet = "Sheet6")
dput(Book1)
Book1 <- Book1 %>%
  mutate(`X6 20%` = X6*1.20) %>%
  mutate(`X6 by 2` = X6/2)

我考虑过通过循环来运行这个代码，但是选择要进行乘法的列会成为一个问题，因为我们必须在 mutate 语句中指定列名，我认为这在这里可能不可行。

有没有人能告诉我是否可以用一种简单的方法实现这个目标？

期望的输出如下：

英文:

I wanted to perform a mathematical function on each unique item in a data frame dynamically.

Normally to perform a mathematical function, we use mutate statement and create a column and perform the mathematical function manually by writing mutate statement after mutate statement.

Which is feasible on a few columns. But what if I have 100 columns and I have to perform 2-5 mathematical function, For example: one would be 20% increase on the initial number, The other one would be to divide the initial number by 2 on each column and keep the original column as is.

Is this possible in R other than writing mutate statement for each specific item?

The data frame I am working with is:

structure(list(`Row Labels` = c(&quot;2023-03-01&quot;, &quot;2023-04-01&quot;, &quot;2023-05-01&quot;, 
&quot;2023-06-01&quot;, &quot;2023-07-01&quot;, &quot;2023-08-01&quot;, &quot;2023-09-01&quot;, &quot;2023-10-01&quot;
), X6 = c(14, 16, 14, 11, 9, 9, 11, 11), X7 = c(50, 50, 50, 50, 
50, 50, 50, 50), X8 = c(75, 75, 75, 75, 75, 75, 75, 75), X9 = c(100, 
100, 100, 100, 100, 100, 100, 100), X11 = c(25, 25, 50, 75, 125, 
200, 325, 525), X12 = c(50, 50, 100, 150, 250, 400, 650, 1050
)), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, 
-8L))

For individual cases this code would suffice:

library(readxl)
library(dplyr)
Book1 &lt;- read_excel(&quot;C:/X/X/X- X/X/Book1.xlsx&quot;,sheet = &quot;Sheet6&quot;)
dput(Book1)
Book1 &lt;- Book1 %&gt;% 
  mutate(`X6 20%` = X6*1.20) %&gt;% 
  mutate(`X6 by 2`= X6/2)

I was thinking of running this through a loop but then selection of columns to multiple becomes a problem as we have to specify the column name in mutate statement, which I believe would not be possible here right.

Can anyone let me know if this can be achieved in a simple approach?

The expected output is given below:

答案1

得分: 2

We could use across()

update: shorter:

library(dplyr)
df %>% 
  mutate(across(2:7, list("20" = ~. * 1.20, 
                          "By_2" = ~. / 2), .names = "{col}_{fn}"))

first answer:

library(dplyr)
df %>% 
  mutate(across(2:7, ~. * 1.20, .names = "{.col}_20%"),
         across(2:7, ~. /2, .names = "{.col}_By 2"))
  `Row Labels`    X6    X7    X8    X9   X11   X12 `X6_20%` `X7_20%` `X8_20%` `X9_20%` `X11_20%` `X12_20%` `X6_By 2` `X7_By 2` `X8_By 2` `X9_By 2` `X11_By 2` `X12_By 2`
  &lt;chr&gt;        &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;
1 2023-03-01      14    50    75   100    25    50     16.8       60       90      120        30        60       7          25      37.5        50       12.5         25
2 2023-04-01      16    50    75   100    25    50     19.2       60       90      120        30        60       8          25      37.5        50       12.5         25
3 2023-05-01      14    50    75   100    50   100     16.8       60       90      120        60       120       7          25      37.5        50       25           50
4 2023-06-01      11    50    75   100    75   150     13.2       60       90      120        90       180       5.5        25      37.5        50       37.5         75
5 2023-07-01       9    50    75   100   125   250     10.8       60       90      120       150       300       4.5        25      37.5        50       62.5        125
6 2023-08-01       9    50    75   100   200   400     10.8       60       90      120       240       480       4.5        25      37.5        50      100          200
7 2023-09-01      11    50    75   100   325   650     13.2       60       90      120       390       780       5.5        25      37.5        50      162.         325
8 2023-10-01      11    50    75   100   525  1050     13.2       60       90      120       630      1260       5.5        25      37.5        50      262.         525

英文:

We could use across()

update: shorter:

library(dplyr)
df %&gt;% 
  mutate(across(2:7, list(&quot;20&quot; = ~. * 1.20, 
                          &quot;By_2&quot; = ~. / 2), .names = &quot;{col}_{fn}&quot;))

first answer:

library(dplyr)
df %&gt;% 
  mutate(across(2:7, ~. * 1.20, .names = &quot;{.col}_20%&quot;),
         across(2:7, ~. /2, .names = &quot;{.col}_By 2&quot;))
  `Row Labels`    X6    X7    X8    X9   X11   X12 `X6_20%` `X7_20%` `X8_20%` `X9_20%` `X11_20%` `X12_20%` `X6_By 2` `X7_By 2` `X8_By 2` `X9_By 2` `X11_By 2` `X12_By 2`
  &lt;chr&gt;        &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;
1 2023-03-01      14    50    75   100    25    50     16.8       60       90      120        30        60       7          25      37.5        50       12.5         25
2 2023-04-01      16    50    75   100    25    50     19.2       60       90      120        30        60       8          25      37.5        50       12.5         25
3 2023-05-01      14    50    75   100    50   100     16.8       60       90      120        60       120       7          25      37.5        50       25           50
4 2023-06-01      11    50    75   100    75   150     13.2       60       90      120        90       180       5.5        25      37.5        50       37.5         75
5 2023-07-01       9    50    75   100   125   250     10.8       60       90      120       150       300       4.5        25      37.5        50       62.5        125
6 2023-08-01       9    50    75   100   200   400     10.8       60       90      120       240       480       4.5        25      37.5        50      100          200
7 2023-09-01      11    50    75   100   325   650     13.2       60       90      120       390       780       5.5        25      37.5        50      162.         325
8 2023-10-01      11    50    75   100   525  1050     13.2       60       90      120       630      1260       5.5        25      37.5        50      262.         525

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中动态地对每一列执行特定的数学函数。

问题

答案1

无法删除图表下方的神秘零。

Unlist elements from unequal vectors at the last level of a nested list while keeping the sublist name in R

用移动平均值替换时间序列数据中的异常值。

Custom heatmap theme in R?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。