英文:
How to select the columns of a dataframe based on a vector of strings, matching for exact coincidence?
问题
我有一个数据框,其中包含以下列名:
NewYork_10
NewYork_20
NewYork3_10
NewYork3_20
NewYork4_10
NewYork4_20
HongKong_10
HongKong_20
SanFrancisco_10
SanFrancisco_20
并且我有一个向量:
list <- c("NewYork", "SanFrancisco")
我想要一个脚本,它可以创建一个新的数据框,选择那些在下划线之前具有完全相同字符串的列。
在上面给出的示例中,你将获得一个新的数据框,其中包含以下列。
NewYork_10
NewYork_20
SanFrancisco_10
SanFrancisco_20
我尝试过多次使用grep:
`dplyr::select(matches(list_cities))`
`dplyr::select(matches(paste0(list_cities), "_"))`
甚至尝试使用向量的锚点,但我不确定是否可能。
`dplyr::select(matches(paste0("^", list_cities, "_.*")))`
但在每种情况下,它都捕获了所有以给定子字符串开头的向量值。
英文:
I have a dataframe with the followign column names:
NewYork_10
NewYork_20
NewYork3_10
NewYork3_20
NewYork4_10
NewYork4_20
HongKong_10
HongKong_20
SanFrancisco_10
SanFrancisco_20
And I have a vector:
list <- c("NewYork", "SanFrancisco")
I want a script that creates a new dataframe, selecting those columns that have the exact same string before the underscore.
In the example given above, you would get a new dataframe with the following columns.
NewYork_10
NewYork_20
SanFrancisco_10
SanFrancisco_20
I did several tries with grep:
dplyr::select(matches(list_cities))
dplyr::select(matches(paste0(list_cities), "_"))
And even using anchors for a vector, which I'm not sure is possible.
dplyr::select(matches(paste0("^",list_cities, "_.*")))
But in every case it's capturing all the values of the vector that start with the given substring.
答案1
得分: 1
你可以尝试:
df[grep("^(NewYork|SanFrancisco)_", names(df))]
或者使用 dplyr::select
:
library(tidyverse)
df %>% select(matches("^(NewYork|SanFrancisco)_"))
其中 ^
表示字符串的开头,(NewYork|SanFrancisco)
匹配以 NewYork
或 SanFrancisco
开头后跟 _
。
或者使用 startsWith
:
df[Reduce(`|`, lapply(paste0(name_list, "_"), startsWith, x=names(df)))]
数据(来自 @benson23):
df <- data.frame(NewYork_10 = 1,
NewYork_20 = 1,
NewYork3_10 = 1,
NewYork3_20 = 1,
NewYork4_10 = 1,
NewYork4_20 = 1,
HongKong_10 = 1,
HongKong_20 = 1,
SanFrancisco_10 = 1,
SanFrancisco_20 = 1)
name_list <- c("NewYork", "SanFrancisco")
英文:
You can try:
df[grep("^(NewYork|SanFrancisco)_", names(df))]
#df[grep(paste0("^(", paste0(name_list, collapse="|"), ")_"), names(df))] #Alternative using the name_list
# NewYork_10 NewYork_20 SanFrancisco_10 SanFrancisco_20
#1 1 1 1 1
or using dplyr::select
library(tidyverse)
df %>% select(matches("^(NewYork|SanFrancisco)_"))
# NewYork_10 NewYork_20 SanFrancisco_10 SanFrancisco_20
#1 1 1 1 1
Where ^
is the start of the string, (NewYork|SanFrancisco)
matches NewYork
or SanFrancisco
followed by _
.
Or using startsWith
:
df[Reduce(`|`, lapply(paste0(name_list, "_"), startsWith, x=names(df)))]
# NewYork_10 NewYork_20 SanFrancisco_10 SanFrancisco_20
#1 1 1 1 1
Data (taken from @benson23)
df <- data.frame(NewYork_10 = 1,
NewYork_20 = 1,
NewYork3_10 = 1,
NewYork3_20 = 1,
NewYork4_10 = 1,
NewYork4_20 = 1,
HongKong_10 = 1,
HongKong_20 = 1,
SanFrancisco_10 = 1,
SanFrancisco_20 = 1)
name_list <- c("NewYork", "SanFrancisco")
答案2
得分: 1
We can also use matches
:
df %>%
select(matches("(NewYork)|(SanFrancisco)_.*")
英文:
We can also use matches
df %>%
select(matches("(NewYork)|(SanFrancisco)_.*")
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论