R dplyr:如何在grep()中筛选存储在向量中的列名?

huangapple go评论87阅读模式
英文:

R dplyr: how to filter a column within a grep() when its name is stored in a vector?

问题

Filtering a column of a data frame data using dplyr when its name is stored within an "external" vector col_name can be achieved using !!. However, this no longer appears to work if trying to apply the same logic within a filter and a grepl().

How one should do that in plyr in a single line?

  1. data <- read.table(text="Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
  2. A 1 1 0 0 1
  3. B 0 1 1 0 0
  4. C 1 0 1 1 1
  5. D 0 1 0 0 1",h=T)
  6. kw <- "B|D"
  7. data %>%
  8. filter(grepl(toupper(kw), toupper(Sol_name))) # works
  9. Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
  10. B 0 1 1 0 0
  11. D 0 1 0 0 1
  12. col_name <- "Sol_name"
  13. data %>%
  14. filter(grepl(toupper(kw), toupper(!!col_name))) # does not
  15. [1] Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
  16. <0 rows> (or 'row.names' of length 0)
英文:

Filtering a column of a data frame data using dplyr when its name is stored within an "external" vector col_name can be achieved using !!. However, this no longer appears to work if trying to apply the same logic within a filter and a grepl().

How one should do that in plyr in a single line?

  1. data &lt;- read.table(text=&quot;Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
  2. A 1 1 0 0 1
  3. B 0 1 1 0 0
  4. C 1 0 1 1 1
  5. D 0 1 0 0 1&quot;,h=T)
  6. kw &lt;- &quot;B|D&quot;
  7. data %&gt;%
  8. filter(grepl(toupper(kw), toupper(Sol_name))) # works
  9. Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
  10. 1 B 0 1 1 0 0
  11. 2 D 0 1 0 0 1
  12. col_name &lt;- &quot;Sol_name&quot;
  13. data %&gt;%
  14. filter(grepl(toupper(kw), toupper(!!col_name))) # does not
  15. [1] Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
  16. &lt;0 lignes&gt; (ou &#39;row.names&#39; de longueur nulle)

答案1

得分: 4

你可以使用 [[..data

  1. col_name <- "Sol_name"
  2. kw <- "B|D"
  3. data %>%
  4. filter(grepl(kw, .data[[col_name]]))
  5. # Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
  6. # 1 B 0 1 1 0 0
  7. # 2 D 0 1 0 0 1

来自 ?.data

.data 与 magrittr 代词 . 的区别
在 magrittr 流水线中,.data 与 magrittr 代词 .. 不一定可以互换使用。特别是在分组的数据框中,.data 代表当前组的切片,而代词 . 代表整个数据框。在数据蒙版上下文中,始终优先使用 .data。

英文:

You can use [[ with . or .data:

  1. col_name &lt;- &quot;Sol_name&quot;
  2. kw &lt;- &quot;B|D&quot;
  3. data %&gt;%
  4. filter(grepl(kw, .data[[col_name]]))
  5. # Sol_name geo_pos loc_pos dol_pos pol_pos kol_pos
  6. # 1 B 0 1 1 0 0
  7. # 2 D 0 1 0 0 1

From ?.data

> .data versus the magrittr pronoun .
> In a magrittr pipeline, .data is not necessarily interchangeable with the magrittr pronoun .. With grouped data frames in particular, .data represents the current group
> slice whereas the pronoun . represents the whole data frame. Always
> prefer using .data in data-masked context.

huangapple
  • 本文由 发表于 2023年5月17日 15:57:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76269765.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定