2023年6月12日 04:41:01go评论89阅读模式

英文:

Looping over data frame to "clean" data

问题

这是我所拥有的数据类型：

日期	站点	参数1	参数2
2020-01-01	A	<5	45
2020-02-01	B	<5	47

为了能够绘制这些数据并标记LOQ值（<5），以及计算一些基本统计数据，我需要创建新的列，其中包括LOQ标志（<）和数值数据。

我不知道参数的确切名称（实际上是“Fe”、“Cu”、“N-tot”等等），所以我想循环遍历参数列（不包括日期和站点），为每个参数创建两个新列，一个包含数值数据，另一个包含LOQ标志。就像这样：

日期	站点	参数1_org	参数1_new	参数1_loq	参数2_org	参数2_new	参数2_loq
2020-01-01	A	<5	5	<	45	45	=
2020-02-01	B	<5	5	<	47	47	=

我尝试过使用mutate（dplyr），但我不知道如何在mutate和across中使用条件和gsub。我也考虑过使用apply和参数列表，但在代码中迷失了方向。

我需要一些关于选择哪种方法以及如何实现这一目标的建议。我感谢提供的所有帮助！

英文:

This is the kind of data I have:

Date	Station	Param1	Param2
2020-01-01	A	<5	45
2020-02-01	B	<5	47

To be able to plot this data, mark the LOQ-values (<5) and compute some basic statistics, I need to create new columns with the LOQ-flag (<) and numeric values separated.

I don't have exact knowledge of the Param-names (they are actually "Fe", "Cu", "N-tot" and so on), so I would like to loop over the Param-columns (not Date and Station) and create two new columns for each Param, one with the numerical data and one with the LOQ-flag. Like this:

Date	Station	Param1_org	Param1_new	Param1_loq	Param2_org	Param2_new	Param2_loq
2020-01-01	A	<5	5	<	45	45	=
2020-02-01	B	<5	5	<	47	47	=

I have tried mutate (dplyr) but I am struggeling with how to use the conditions together with gsub inside mutate and across. I also considered using apply and a list of Params, but got lost in the code.

I need some advice on which approach to choose, and a simple example of how to achieve this. I appreciate all help given!

答案1

得分: 0

以下是您要翻译的内容：

这是您问题的答案

library(tidyverse)
data <- tibble(Date = c(as.Date("2020-01-01"), as.Date("2020-02-01")),
                      Station = c("A", "B"), 
                      Param1 = c("<5", "<5"),
                      Param2 = c("45", "47"))
cols <- colnames(data)
param_cols <- cols[str_detect(cols, "^Param")]
for (col in param_cols) {
  col_name <- paste(col, "org", sep = "_")
  col_new <- paste(col, "new", sep = "_")
  col_loq <- paste(col, "loq", sep = "_")
  data <-data %>% 
    mutate(!!col_name := get(col), 
           !!col_new := str_extract(get(col), "\\d+"),
           !!col_loq := ifelse(str_detect(get(col), "^\\d"), 
                               "=",
                               ifelse(str_detect(get(col), "^<"), "<", ">")
                               ),
           !!col := NULL
           )
}
print(data)

我所做的只是简单地遍历所有包含Param的列，并使用mutate（再次使用另一个正则表达式检测）。!! 只是为了对一个变量进行转义，以便在dplyr参数上使用（注意：dplyr版本1.0或更高版本）。

英文:

Here's the answer of your question

library(tidyverse)
data &lt;- tibble(Date = c(as.Date(&quot;2020-01-01&quot;), as.Date(&quot;2020-02-01&quot;)),
                  Station = c(&quot;A&quot;, &quot;B&quot;), 
                  Param1 = c(&quot;&lt;5&quot;, &quot;&lt;5&quot;),
                  Param2 = c(&quot;45&quot;, &quot;47&quot;))
cols &lt;- colnames(data)
param_cols &lt;- cols[str_detect(cols, &quot;^Param&quot;)]
for (col in param_cols) {
  col_name &lt;- paste(col, &quot;org&quot;, sep = &quot;_&quot;)
  col_new&lt;- paste(col, &quot;new&quot;, sep = &quot;_&quot;)
  col_loq &lt;- paste(col, &quot;loq&quot;, sep = &quot;_&quot;)
  data &lt;-data %&gt;% 
    mutate(!!col_name := get(col), 
           !!col_new := str_extract(get(col), &quot;\\d+&quot;),
           !!col_loq := ifelse(str_detect(get(col), &quot;^\\d&quot;), 
                               &quot;=&quot;, 
                               ifelse(str_detect(get(col), &quot;^&lt;&quot;), &quot;&lt;&quot;, &quot;&gt;&quot;)
                               ),
           !!col := NULL
           )
}
print(data)

What I did is simply looping through all the columns contain Param and using mutate (again with another regex detection). The !! is just escaping for a variable to be able for being used on dplyr argument (note: dplyr version 1.0 or higher)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

循环遍历数据框以“清理”数据

问题

答案1

如何修复具有相同键但不同值的字典错误

将自定义函数应用于r中的每一行。

How do I loop through the interface implementing Classes (and call interface methods of the class) that I have fetched through Refections in Java?

Selenium Java – 如何在网页元素路径中存储变量，以便循环遍历多个元素

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。