基于名称(正则表达式)对选定列进行变异。

huangapple go评论114阅读模式
英文:

mutate on selected columns based on name (regular expression)

问题

以下是翻译好的部分:

这是一个示例数据集:

  1. ff <- data.frame(
  2. id = c(1:4),
  3. w1...33112..Value = c(10, 20, 30, 40),
  4. w1...33112..Time = c(4, 3, 2, 1),
  5. w2...33113..Value = c(1, .9, .75, .7),
  6. w2...33113..Time = c(10, 50, 30, 20),
  7. w3...33552..Value = c(1, 2, 3, 4),
  8. w3...33552..Time = c(.5, .5, .9, .9),
  9. w4...33442..Value = c(100, 50, 40, 30),
  10. w4...33442..Time = c(2, 1, 4, 3),
  11. w5...35692..Value = c(.5, .6, .7, .8)
  12. )

我想对基于名称选择的列执行一些简单的操作(通常使用diff) - 列名必须包含字符串Value

下面的示例是针对两个变量的,实际数据中有几十种这样的情况。

  1. ff.2 <- ff %>% mutate (
  2. w1.used = c(0, diff(w1...33112..Value)),
  3. w2.used = c(0, diff(w2...33113..Value)),
  4. )

新列的名称应以字符字符串开头,直到第一个点和所选字符串(例如"used")。

英文:

Here's a sample dataset:

  1. ff &lt;- data.frame(
  2. id = c(1:4),
  3. w1...33112..Value = c(10, 20, 30, 40),
  4. w1...33112..Time = c(4, 3, 2, 1),
  5. w2...33113..Value = c(1, .9, .75, .7),
  6. w2...33113..Time = c(10, 50, 30, 20),
  7. w3...33552..Value = c(1, 2, 3, 4),
  8. w3...33552..Time = c(.5, .5, .9, .9),
  9. w4...33442..Value = c(100, 50, 40, 30),
  10. w4...33442..Time = c(2, 1, 4, 3),
  11. w5...35692..Value = c(.5, .6, .7, .8)
  12. )

I want to perform some simple operations (usually using diff) on columns selected based on name - the column name must contain the string Value

The example below is for two variables, and there are dozens of such cases in real data.

  1. ff.2 &lt;- ff %&gt;% mutate (
  2. w1.used = c(0, diff(w1...33112..Value)),
  3. w2.used = c(0, diff(w2...33113..Value)),
  4. )

The name of the new column should start with a string of characters up to the first dot and the selected string (for example "used").

答案1

得分: 4

你可以简单地执行以下操作:

  1. ff %>%
  2. mutate(across(ends_with('Value'), ~c(0, diff(.)),
  3. .names = "{str_extract(.col, 'w[0-9]+')}.used"))
英文:

You could simply do:

  1. ff %&gt;%
  2. mutate(across(ends_with(&#39;Value&#39;), ~c(0, diff(.)),
  3. .names = &quot;{str_extract(.col, &#39;w[0-9]+&#39;)}.used&quot;))

答案2

得分: 1

The mutate_at() 函数允许您基于满足特定条件的现有变量创建新变量,例如,如果它们的名称包含特定字符串。此外,str_extract() 函数将提取字符串直到第一个点以创建新的列名。

以下是完整的翻译:

mutate_at() 函数允许您基于满足特定条件的现有变量创建新变量,例如,如果它们的名称包含特定字符串。此外,str_extract() 函数将提取字符串直到第一个点以创建新的列名。以下是完整的翻译:

这里是一个示例代码,它正好可以实现您想要的功能:

  1. library(tidyverse)
  2. # 选择只包含“Value”在其名称中的列
  3. value_cols <- grep("Value", names(ff), value = TRUE)
  4. # 创建新列
  5. ff <- ff %>%
  6. mutate_at(
  7. .vars = value_cols,
  8. .funs = list(
  9. used = ~ c(0, diff(.))
  10. )
  11. )
  12. # 重命名新列
  13. new_col_names <- str_extract(value_cols, "^[^.]*") %>% paste0(".used")
  14. names(ff)[grep("used", names(ff))] <- new_col_names
  15. ff
英文:

The mutate_at() function allows you to create new variables based on existing variables that match certain criteria, like if their name contains a certain string. Also, str_extract() function will extract the string until the first dot to create new column names.

Here is the sample code which does exactly what you want:

  1. library(tidyverse)
  2. # Select only columns with &quot;Value&quot; in their names
  3. value_cols &lt;- grep(&quot;Value&quot;, names(ff), value = TRUE)
  4. # Create new columns
  5. ff &lt;- ff %&gt;%
  6. mutate_at(
  7. .vars = value_cols,
  8. .funs = list(
  9. used = ~ c(0, diff(.))
  10. )
  11. )
  12. # Rename new columns
  13. new_col_names &lt;- str_extract(value_cols, &quot;^[^.]*&quot;) %&gt;% paste0(&quot;.used&quot;)
  14. names(ff)[grep(&quot;used&quot;, names(ff))] &lt;- new_col_names
  15. ff

答案3

得分: 1

以下是代码的翻译部分:

  1. # 选取以"Value"结尾的列
  2. d <- ff[endsWith(names(ff), "Value")]
  3. # 计算差值
  4. u <- d - rbind(d[1, ], d[-nrow(d), ])
  5. # 将差值列添加到数据框中,并设置列名
  6. ff.2 <- cbind(ff, setNames(u, sub("\\..*", ".used", names(u))))

输出结果如下:

  1. > ff.2
  2. id w1...33112..Value w1...33112..Time w2...33113..Value w2...33113..Time
  3. 1 1 10 4 1.00 10
  4. 2 2 20 3 0.90 50
  5. 3 3 30 2 0.75 30
  6. 4 4 40 1 0.70 20
  7. w3...33552..Value w3...33552..Time w4...33442..Value w4...33442..Time
  8. 1 1 0.5 100 2
  9. 2 2 0.5 50 1
  10. 3 3 0.9 40 4
  11. 4 4 0.9 30 3
  12. w5...35692..Value w1.used w2.used w3.used w4.used w5.used
  13. 1 0.5 0 0.00 0 0 0.0
  14. 2 0.6 10 -0.10 1 -50 0.1
  15. 3 0.7 10 -0.15 1 -10 0.1
  16. 4 0.8 10 -0.05 1 -10 0.1

希望这对您有帮助!

英文:

A base R option

  1. d &lt;- ff[endsWith(names(ff), &quot;Value&quot;)]
  2. u &lt;- d - rbind(d[1, ], d[-nrow(d), ])
  3. ff.2 &lt;- cbind(ff, setNames(u, sub(&quot;\\..*&quot;, &quot;.used&quot;, names(u))))

gives

  1. &gt; ff.2
  2. id w1...33112..Value w1...33112..Time w2...33113..Value w2...33113..Time
  3. 1 1 10 4 1.00 10
  4. 2 2 20 3 0.90 50
  5. 3 3 30 2 0.75 30
  6. 4 4 40 1 0.70 20
  7. w3...33552..Value w3...33552..Time w4...33442..Value w4...33442..Time
  8. 1 1 0.5 100 2
  9. 2 2 0.5 50 1
  10. 3 3 0.9 40 4
  11. 4 4 0.9 30 3
  12. w5...35692..Value w1.used w2.used w3.used w4.used w5.used
  13. 1 0.5 0 0.00 0 0 0.0
  14. 2 0.6 10 -0.10 1 -50 0.1
  15. 3 0.7 10 -0.15 1 -10 0.1
  16. 4 0.8 10 -0.05 1 -10 0.1

huangapple
  • 本文由 发表于 2023年6月29日 19:30:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76580615.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定