2023年2月24日 02:35:53go评论90阅读模式

英文:

R dataframe/ lapply(): get rid of rows with particular values in columns containing particular strings, while keeping everything else?

问题

以下是翻译的代码部分：

# 使用lapply函数来筛选包含4的标志列
filtered_dataframes <- lapply(df_list, function(df) {
  # 获取标志列的列名
  flags_columns <- grep("flags", names(df), value = TRUE)
  
  # 遍历每个标志列，筛选包含4的行
  for (col in flags_columns) {
    df <- df[!grepl("4", df[[col]]), ]
  }
  
  return(df)
})

请注意，此代码会在每个数据框中的标志列中查找包含数字4的行，并从数据框中删除这些行。最后，filtered_dataframes 包含了筛选后的数据框列表。

英文:

I have 16 dataframes I am trying to quality check and delete poor quality rows in R. I already know of lapply() and have used it for simpler wrangling problems to apply the same thing to all my dataframes at once, but for whatever reason I'm having a mental block currently.

The format of each individual dataframe is like so, where every other column contains a "flags" column. The flags column contains strings of values. If any of the values in the string are a 4, I want to filter those rows out of the dataframe.

head(df)
timestamp    wind_speed_max wind_speed_max_flags   wind_speed_mean
1            UTC meters per second                  NAN meters per second
2    data logger   Airmar WS-200WX                  NAN   Airmar WS-200WX
3 6/2/2015 15:46               7.6              1 1 4 1              5.12
4 6/2/2015 16:01               7.2              1 1 1 1              5.16
5 6/2/2015 16:16               8.1              1 1 1 1              5.97
6 6/2/2015 16:31               8.5              1 1 1 1             5.909
  wind_speed_mean_flags wind_direction_mean wind_direction_mean_flags
1                   NAN             degrees                       NAN
2                   NAN     Airmar WS-200WX                       NAN
3               1 1 1 1               57.14                   1 2 1 2
4               1 1 1 1               61.64                   1 2 1 4
5               1 1 1 1                  68                   1 2 1 2
6               4 1 1 1               73.14                   1 2 1 2

I know I can try to grep("flags") for the column names, and I also think I could use a similar grep method to filter out the strings containing a 4? Perhaps using some Boolean operators. But I am struggling to piece all of this together to retain the rest of the data, and to ideally perform this at the same time for all 16 dataframes for example lapply(df_list, function(x) <insert code that can filter out flags with 4s for each x dataframe>)

答案1

得分: 1

让我们从编写代码来过滤一个数据框开始 - 我们将查看包含“flags”在名称中的列，并使用“grep”查找“4”。然后，我们将使用rowSums来计算每行中的4的数量，仅保留4的数量等于0的行。

# 计算每行中“flag”列中的4的数量
count_4 = df[grepl("flags", names(df))] %>%
  sapply(grepl, pattern = "4") %>%
  rowSums(na.rm = TRUE)

将其放入lapply中：

modified_data_list = lapply(data_list, function(df) {
  count_4 = df[grepl("flags", names(df))] %>%
    sapply(grepl, pattern = "4") %>%
    rowSums(na.rm = TRUE)
  df[count_4 == 0, ]
})

英文:

Let's start by writing code to filter one data frame - we'll look at the columns that include "flags" in the name and grep for "4". Then we'll use rowSums to count the number of 4s in each row, keeping only rows with 4 count == 0.

# count the number of 4s in each row of &quot;flag&quot; cols of `df`
count_4 = df[grepl(&quot;flags&quot;, names(df))] |&gt;
  sapply(grepl, pattern = &quot;4&quot;) |&gt;
  rowSums(na.rm = TRUE)

Putting it in lapply:

modified_data_list = lapply(data_list, function(df) {
  count_4 = df[grepl(&quot;flags&quot;, names(df))] |&gt;
    sapply(grepl, pattern = &quot;4&quot;) |&gt;
    rowSums(na.rm = TRUE)
  df[count_4 == 0, ]
})

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R dataframe/ lapply(): get rid of rows with particular values in columns containing particular strings, while keeping everything else?

问题

答案1

如何在“merge”转换后返回与开始时相同的对象

如何在R中使箭头动画化

R survminer::ggsurvplot无法合并，因为存在”atomic vector”。

在R中，使用一个函数引用另一个数据框，向数据框添加一列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。