循环遍历数据框以“清理”数据

huangapple go评论58阅读模式
英文:

Looping over data frame to "clean" data

问题

这是我所拥有的数据类型:

日期 站点 参数1 参数2
2020-01-01 A <5 45
2020-02-01 B <5 47

为了能够绘制这些数据并标记LOQ值(<5),以及计算一些基本统计数据,我需要创建新的列,其中包括LOQ标志(<)和数值数据。

我不知道参数的确切名称(实际上是“Fe”、“Cu”、“N-tot”等等),所以我想循环遍历参数列(不包括日期和站点),为每个参数创建两个新列,一个包含数值数据,另一个包含LOQ标志。就像这样:

日期 站点 参数1_org 参数1_new 参数1_loq 参数2_org 参数2_new 参数2_loq
2020-01-01 A <5 5 < 45 45 =
2020-02-01 B <5 5 < 47 47 =

我尝试过使用mutate(dplyr),但我不知道如何在mutateacross中使用条件和gsub。我也考虑过使用apply和参数列表,但在代码中迷失了方向。

我需要一些关于选择哪种方法以及如何实现这一目标的建议。我感谢提供的所有帮助!

英文:

This is the kind of data I have:

Date Station Param1 Param2
2020-01-01 A <5 45
2020-02-01 B <5 47

To be able to plot this data, mark the LOQ-values (<5) and compute some basic statistics, I need to create new columns with the LOQ-flag (<) and numeric values separated.

I don't have exact knowledge of the Param-names (they are actually "Fe", "Cu", "N-tot" and so on), so I would like to loop over the Param-columns (not Date and Station) and create two new columns for each Param, one with the numerical data and one with the LOQ-flag. Like this:

Date Station Param1_org Param1_new Param1_loq Param2_org Param2_new Param2_loq
2020-01-01 A <5 5 < 45 45 =
2020-02-01 B <5 5 < 47 47 =

I have tried mutate (dplyr) but I am struggeling with how to use the conditions together with gsub inside mutate and across. I also considered using apply and a list of Params, but got lost in the code.

I need some advice on which approach to choose, and a simple example of how to achieve this. I appreciate all help given!

答案1

得分: 0

以下是您要翻译的内容:

这是您问题的答案

library(tidyverse)

data <- tibble(Date = c(as.Date("2020-01-01"), as.Date("2020-02-01")),
                      Station = c("A", "B"), 
                      Param1 = c("<5", "<5"),
                      Param2 = c("45", "47"))

cols <- colnames(data)
param_cols <- cols[str_detect(cols, "^Param")]

for (col in param_cols) {
  col_name <- paste(col, "org", sep = "_")
  col_new <- paste(col, "new", sep = "_")
  col_loq <- paste(col, "loq", sep = "_")
  data <-data %>% 
    mutate(!!col_name := get(col), 
           !!col_new := str_extract(get(col), "\\d+"),
           !!col_loq := ifelse(str_detect(get(col), "^\\d"), 
                               "=",
                               ifelse(str_detect(get(col), "^<"), "<", ">")
                               ),
           !!col := NULL
           )
}

print(data)

我所做的只是简单地遍历所有包含Param的列,并使用mutate(再次使用另一个正则表达式检测)。!! 只是为了对一个变量进行转义,以便在dplyr参数上使用(注意:dplyr版本1.0或更高版本)。

英文:

Here's the answer of your question

library(tidyverse)

data &lt;- tibble(Date = c(as.Date(&quot;2020-01-01&quot;), as.Date(&quot;2020-02-01&quot;)),
                  Station = c(&quot;A&quot;, &quot;B&quot;), 
                  Param1 = c(&quot;&lt;5&quot;, &quot;&lt;5&quot;),
                  Param2 = c(&quot;45&quot;, &quot;47&quot;))

cols &lt;- colnames(data)
param_cols &lt;- cols[str_detect(cols, &quot;^Param&quot;)]


for (col in param_cols) {
  col_name &lt;- paste(col, &quot;org&quot;, sep = &quot;_&quot;)
  col_new&lt;- paste(col, &quot;new&quot;, sep = &quot;_&quot;)
  col_loq &lt;- paste(col, &quot;loq&quot;, sep = &quot;_&quot;)
  data &lt;-data %&gt;% 
    mutate(!!col_name := get(col), 
           !!col_new := str_extract(get(col), &quot;\\d+&quot;),
           !!col_loq := ifelse(str_detect(get(col), &quot;^\\d&quot;), 
                               &quot;=&quot;, 
                               ifelse(str_detect(get(col), &quot;^&lt;&quot;), &quot;&lt;&quot;, &quot;&gt;&quot;)
                               ),
           !!col := NULL
           )
}

print(data)

循环遍历数据框以“清理”数据

What I did is simply looping through all the columns contain Param and using mutate (again with another regex detection). The !! is just escaping for a variable to be able for being used on dplyr argument (note: dplyr version 1.0 or higher)

huangapple
  • 本文由 发表于 2023年6月12日 04:41:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76452443.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定