英文:
R dplyr: how to filter a column within a grep() when its name is stored in a vector?
问题
Filtering a column of a data frame data using dplyr when its name is stored within an "external" vector col_name can be achieved using !!. However, this no longer appears to work if trying to apply the same logic within a filter and a grepl().
How one should do that in plyr in a single line?
data <- read.table(text="Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
A 1 1 0 0 1
B 0 1 1 0 0
C 1 0 1 1 1
D 0 1 0 0 1",h=T)
kw <- "B|D"
data %>%
  filter(grepl(toupper(kw), toupper(Sol_name))) # works
Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
B 0 1 1 0 0
D 0 1 0 0 1
col_name <- "Sol_name"
data %>%
  filter(grepl(toupper(kw), toupper(!!col_name))) # does not
[1] Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos 
<0 rows> (or 'row.names' of length 0)
英文:
Filtering a column of a data frame data using dplyr when its name is stored within an "external" vector col_name can be achieved using !!. However,  this no longer appears to work if trying to apply the same logic within a filter and a grepl().
How one should do that in plyr in a single line?
data <- read.table(text="Sol_name    geo_pos     loc_pos     dol_pos    pol_pos   kol_pos
A            1            1          0          0         1
B            0            1          1          0         0
C            1            0          1          1         1
D            0            1          0          0         1",h=T)
kw <- "B|D"
data %>% 
  filter(grepl(toupper(kw), toupper(Sol_name))) # works
  Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
1        B       0       1       1       0       0
2        D       0       1       0       0       1
col_name <- "Sol_name"
data %>% 
  filter(grepl(toupper(kw), toupper(!!col_name))) # does not
[1] Sol_name geo_pos  loc_pos  dol_pos  pol_pos  kol_pos 
<0 lignes> (ou 'row.names' de longueur nulle)
答案1
得分: 4
你可以使用 [[ 与 . 或 .data:
col_name <- "Sol_name"
kw <- "B|D"
data %>%
  filter(grepl(kw, .data[[col_name]]))
#   Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
# 1        B       0       1       1       0       0
# 2        D       0       1       0       0       1
来自 ?.data
.data 与 magrittr 代词 . 的区别
在 magrittr 流水线中,.data 与 magrittr 代词 .. 不一定可以互换使用。特别是在分组的数据框中,.data 代表当前组的切片,而代词 . 代表整个数据框。在数据蒙版上下文中,始终优先使用 .data。
英文:
You can use [[ with . or .data:
col_name <- "Sol_name"
kw <- "B|D"
data %>% 
  filter(grepl(kw, .data[[col_name]]))
#   Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
# 1        B       0       1       1       0       0
# 2        D       0       1       0       0       1
From ?.data
> .data versus the magrittr pronoun .
> In a magrittr pipeline, .data is not necessarily interchangeable with the magrittr pronoun .. With grouped data frames in particular, .data represents the current group
> slice whereas the pronoun . represents the whole data frame. Always
> prefer using .data in data-masked context.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论