2023年3月7日 22:27:47go评论151阅读模式

英文:

How to select the columns of a dataframe based on a vector of strings, matching for exact coincidence?

问题

我有一个数据框，其中包含以下列名：

NewYork_10
NewYork_20
NewYork3_10
NewYork3_20
NewYork4_10
NewYork4_20
HongKong_10
HongKong_20
SanFrancisco_10
SanFrancisco_20


并且我有一个向量：

list <- c("NewYork", "SanFrancisco")


我想要一个脚本，它可以创建一个新的数据框，选择那些在下划线之前具有完全相同字符串的列。
在上面给出的示例中，你将获得一个新的数据框，其中包含以下列。
NewYork_10
NewYork_20
SanFrancisco_10
SanFrancisco_20
我尝试过多次使用grep：
`dplyr::select(matches(list_cities))`
`dplyr::select(matches(paste0(list_cities), "_"))`
甚至尝试使用向量的锚点，但我不确定是否可能。
`dplyr::select(matches(paste0("^", list_cities, "_.*")))`
但在每种情况下，它都捕获了所有以给定子字符串开头的向量值。

英文:

I have a dataframe with the followign column names:

NewYork_10
NewYork_20
NewYork3_10
NewYork3_20
NewYork4_10
NewYork4_20
HongKong_10
HongKong_20
SanFrancisco_10
SanFrancisco_20

And I have a vector:

list &lt;- c(&quot;NewYork&quot;, &quot;SanFrancisco&quot;)

I want a script that creates a new dataframe, selecting those columns that have the exact same string before the underscore.
In the example given above, you would get a new dataframe with the following columns.
NewYork_10
NewYork_20
SanFrancisco_10
SanFrancisco_20

I did several tries with grep:

dplyr::select(matches(list_cities))

dplyr::select(matches(paste0(list_cities), "_"))

And even using anchors for a vector, which I'm not sure is possible.

dplyr::select(matches(paste0("^",list_cities, "_.*")))

But in every case it's capturing all the values of the vector that start with the given substring.

答案1

得分: 1

你可以尝试：

df[grep("^(NewYork|SanFrancisco)_", names(df))]

或者使用 dplyr::select：

library(tidyverse)
df %>% select(matches("^(NewYork|SanFrancisco)_"))

其中 ^ 表示字符串的开头，(NewYork|SanFrancisco) 匹配以 NewYork 或 SanFrancisco 开头后跟 _。

或者使用 startsWith：

df[Reduce(`|`, lapply(paste0(name_list, "_"), startsWith, x=names(df)))]

数据（来自 @benson23）：

df <- data.frame(NewYork_10 = 1,
           NewYork_20 = 1,
           NewYork3_10 = 1,
           NewYork3_20 = 1,
           NewYork4_10 = 1,
           NewYork4_20 = 1,
           HongKong_10 = 1,
           HongKong_20 = 1,
           SanFrancisco_10 = 1,
           SanFrancisco_20 = 1)
name_list <- c("NewYork", "SanFrancisco")

英文:

You can try:

df[grep(&quot;^(NewYork|SanFrancisco)_&quot;, names(df))]
#df[grep(paste0(&quot;^(&quot;, paste0(name_list, collapse=&quot;|&quot;), &quot;)_&quot;), names(df))] #Alternative using the name_list
#  NewYork_10 NewYork_20 SanFrancisco_10 SanFrancisco_20
#1          1          1               1               1

or using dplyr::select

library(tidyverse)
df %&gt;% select(matches(&quot;^(NewYork|SanFrancisco)_&quot;))
#  NewYork_10 NewYork_20 SanFrancisco_10 SanFrancisco_20
#1          1          1               1               1

Where ^ is the start of the string, (NewYork|SanFrancisco) matches NewYork or SanFrancisco followed by _.

Or using startsWith:

df[Reduce(`|`, lapply(paste0(name_list, &quot;_&quot;), startsWith, x=names(df)))]
#  NewYork_10 NewYork_20 SanFrancisco_10 SanFrancisco_20
#1          1          1               1               1

Data (taken from @benson23)

df &lt;- data.frame(NewYork_10 = 1,
           NewYork_20 = 1,
           NewYork3_10 = 1,
           NewYork3_20 = 1,
           NewYork4_10 = 1,
           NewYork4_20 = 1,
           HongKong_10 = 1,
           HongKong_20 = 1,
           SanFrancisco_10 = 1,
           SanFrancisco_20 = 1)
name_list &lt;- c(&quot;NewYork&quot;, &quot;SanFrancisco&quot;)

答案2

得分: 1

We can also use matches:

df %>%
    select(matches("(NewYork)|(SanFrancisco)_.*")

英文:

We can also use matches

df %&gt;%
    select(matches(&quot;(NewYork)|(SanFrancisco)_.*&quot;)
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据字符串向量选择数据框的列，进行精确匹配？

问题

答案1

答案2

将参数1定义为1减去参数2，使用R的paradox包。

使用两个其他远距离观察的平均值来替换多个缺失的观测数据点。

如何在ggplot2柱状图上正确排序x轴值（月份-年份）？

How to be concise in R code: 1) Reading lists of *.csv files -> 2) rename lists -> 3) merge -> 4) get follow up rates

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。