R dplyr:如何在grep()中筛选存储在向量中的列名?

huangapple go评论58阅读模式
英文:

R dplyr: how to filter a column within a grep() when its name is stored in a vector?

问题

Filtering a column of a data frame data using dplyr when its name is stored within an "external" vector col_name can be achieved using !!. However, this no longer appears to work if trying to apply the same logic within a filter and a grepl().

How one should do that in plyr in a single line?

data <- read.table(text="Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
A 1 1 0 0 1
B 0 1 1 0 0
C 1 0 1 1 1
D 0 1 0 0 1",h=T)

kw <- "B|D"
data %>%
  filter(grepl(toupper(kw), toupper(Sol_name))) # works

Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
B 0 1 1 0 0
D 0 1 0 0 1

col_name <- "Sol_name"
data %>%
  filter(grepl(toupper(kw), toupper(!!col_name))) # does not

[1] Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos 
<0 rows> (or 'row.names' of length 0)
英文:

Filtering a column of a data frame data using dplyr when its name is stored within an "external" vector col_name can be achieved using !!. However, this no longer appears to work if trying to apply the same logic within a filter and a grepl().

How one should do that in plyr in a single line?

data &lt;- read.table(text=&quot;Sol_name    geo_pos     loc_pos     dol_pos    pol_pos   kol_pos
A            1            1          0          0         1
B            0            1          1          0         0
C            1            0          1          1         1
D            0            1          0          0         1&quot;,h=T)

kw &lt;- &quot;B|D&quot;
data %&gt;% 
  filter(grepl(toupper(kw), toupper(Sol_name))) # works

  Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
1        B       0       1       1       0       0
2        D       0       1       0       0       1

col_name &lt;- &quot;Sol_name&quot;
data %&gt;% 
  filter(grepl(toupper(kw), toupper(!!col_name))) # does not

[1] Sol_name geo_pos  loc_pos  dol_pos  pol_pos  kol_pos 
&lt;0 lignes&gt; (ou &#39;row.names&#39; de longueur nulle)

答案1

得分: 4

你可以使用 [[..data

col_name <- "Sol_name"
kw <- "B|D"
data %>%
  filter(grepl(kw, .data[[col_name]]))

#   Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
# 1        B       0       1       1       0       0
# 2        D       0       1       0       0       1

来自 ?.data

.data 与 magrittr 代词 . 的区别
在 magrittr 流水线中,.data 与 magrittr 代词 .. 不一定可以互换使用。特别是在分组的数据框中,.data 代表当前组的切片,而代词 . 代表整个数据框。在数据蒙版上下文中,始终优先使用 .data。

英文:

You can use [[ with . or .data:

col_name &lt;- &quot;Sol_name&quot;
kw &lt;- &quot;B|D&quot;
data %&gt;% 
  filter(grepl(kw, .data[[col_name]]))

#   Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
# 1        B       0       1       1       0       0
# 2        D       0       1       0       0       1

From ?.data

> .data versus the magrittr pronoun .
> In a magrittr pipeline, .data is not necessarily interchangeable with the magrittr pronoun .. With grouped data frames in particular, .data represents the current group
> slice whereas the pronoun . represents the whole data frame. Always
> prefer using .data in data-masked context.

huangapple
  • 本文由 发表于 2023年5月17日 15:57:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76269765.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定