英文:
R dataframe/ lapply(): get rid of rows with particular values in columns containing particular strings, while keeping everything else?
问题
以下是翻译的代码部分:
# 使用lapply函数来筛选包含4的标志列
filtered_dataframes <- lapply(df_list, function(df) {
# 获取标志列的列名
flags_columns <- grep("flags", names(df), value = TRUE)
# 遍历每个标志列,筛选包含4的行
for (col in flags_columns) {
df <- df[!grepl("4", df[[col]]), ]
}
return(df)
})
请注意,此代码会在每个数据框中的标志列中查找包含数字4的行,并从数据框中删除这些行。最后,filtered_dataframes
包含了筛选后的数据框列表。
英文:
I have 16 dataframes I am trying to quality check and delete poor quality rows in R. I already know of lapply() and have used it for simpler wrangling problems to apply the same thing to all my dataframes at once, but for whatever reason I'm having a mental block currently.
The format of each individual dataframe is like so, where every other column contains a "flags" column. The flags column contains strings of values. If any of the values in the string are a 4, I want to filter those rows out of the dataframe.
head(df)
timestamp wind_speed_max wind_speed_max_flags wind_speed_mean
1 UTC meters per second NAN meters per second
2 data logger Airmar WS-200WX NAN Airmar WS-200WX
3 6/2/2015 15:46 7.6 1 1 4 1 5.12
4 6/2/2015 16:01 7.2 1 1 1 1 5.16
5 6/2/2015 16:16 8.1 1 1 1 1 5.97
6 6/2/2015 16:31 8.5 1 1 1 1 5.909
wind_speed_mean_flags wind_direction_mean wind_direction_mean_flags
1 NAN degrees NAN
2 NAN Airmar WS-200WX NAN
3 1 1 1 1 57.14 1 2 1 2
4 1 1 1 1 61.64 1 2 1 4
5 1 1 1 1 68 1 2 1 2
6 4 1 1 1 73.14 1 2 1 2
I know I can try to grep("flags") for the column names, and I also think I could use a similar grep method to filter out the strings containing a 4? Perhaps using some Boolean operators. But I am struggling to piece all of this together to retain the rest of the data, and to ideally perform this at the same time for all 16 dataframes for example lapply(df_list, function(x) <insert code that can filter out flags with 4s for each x dataframe>)
答案1
得分: 1
让我们从编写代码来过滤一个数据框开始 - 我们将查看包含“flags”在名称中的列,并使用“grep”查找“4”。然后,我们将使用rowSums
来计算每行中的4的数量,仅保留4的数量等于0的行。
# 计算每行中“flag”列中的4的数量
count_4 = df[grepl("flags", names(df))] %>%
sapply(grepl, pattern = "4") %>%
rowSums(na.rm = TRUE)
将其放入lapply
中:
modified_data_list = lapply(data_list, function(df) {
count_4 = df[grepl("flags", names(df))] %>%
sapply(grepl, pattern = "4") %>%
rowSums(na.rm = TRUE)
df[count_4 == 0, ]
})
英文:
Let's start by writing code to filter one data frame - we'll look at the columns that include "flags" in the name and grep for "4". Then we'll use rowSums
to count the number of 4s in each row, keeping only rows with 4 count == 0.
# count the number of 4s in each row of "flag" cols of `df`
count_4 = df[grepl("flags", names(df))] |>
sapply(grepl, pattern = "4") |>
rowSums(na.rm = TRUE)
Putting it in lapply
:
modified_data_list = lapply(data_list, function(df) {
count_4 = df[grepl("flags", names(df))] |>
sapply(grepl, pattern = "4") |>
rowSums(na.rm = TRUE)
df[count_4 == 0, ]
})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论