英文:
Create new column based of subset of strings in another column in R?
问题
我正在尝试在我的数据框中创建一个新列,该列基于另一个字符串子集。以下是您的数据框和要求:
- 如果字符串中包含"Acoustic"、"radio"或"PIT",新列的值应为"receiver based"。
- 如果字符串中同时包含"acoustic"和"satellite",新列的值应为"Both"。
- 对于其他情况,新列的值应为"non receiver based"。
您可以使用以下代码实现这些要求:
df$Type <- ifelse(grepl("acoustic", df$Combination, ignore.case = TRUE) & grepl("satellite", df$Combination, ignore.case = TRUE), "Both",
ifelse(grepl("Acoustic", df$Combination), "Receiver Based",
ifelse(grepl("radio|PIT", df$Combination), "Receiver Based", "Non Receiver Based")))
这段代码首先检查是否同时包含"acoustic"和"satellite",如果是,则设置为"Both"。接下来,它检查是否包含"Acoustic",如果是,则设置为"Receiver Based"。最后,它检查是否包含"radio"或"PIT",如果是,则设置为"Receiver Based",否则设置为"Non Receiver Based"。
这样,新列"Type"将根据您的要求进行分类。
英文:
I am trying to create new a column in my dataframe, based off a subset of strings in another
This is my dataframe
df =structure(list(Combination = c("BRUV_Acoustic_Satellite", "BRUV_Acoustic_Satellite",
"BRUV_Acoustic_Satellite", "BRUV_Acoustic_Satellite", "BRUV_Acoustic_Satellite",
"BRUV_Acoustic_Satellite", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Controlled_Acoustic", "Controlled_Acoustic",
"Controlled_Acoustic", "Controlled_Acoustic", "Controlled_Acoustic",
"Controlled_Acoustic", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Stationary_Radio", "Stationary_Radio",
"Stationary_Radio", "Animalborne_Satellite_Archival", "Animalborne_Satellite_Archival",
"Animalborne_Satellite_Archival", "Animalborne_Satellite_Archival",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"BRUV_Acoustic_Satellite", "BRUV_Acoustic_Satellite", "BRUV_Acoustic_Satellite",
"Stationary_Archival", "Stationary_Archival", "Stationary_Archival",
"Stationary_Archival", "Stationary_Acoustic_Radio_PIT", "Stationary_Acoustic_Radio_PIT",
"Stationary_Acoustic_Radio_PIT", "Controlled_Acoustic", "Controlled_Acoustic",
"Stationary_PIT", "Stationary_PIT", "Stationary_Acousitc_PIT",
"Stationary_Acousitc_PIT", "Stationary_Acousitc_PIT", "BRUV_Acoustic",
"BRUV_Acoustic", "BRUV_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Archival",
"Stationary_Archival", "Stationary_Archival", "Stationary_Archival",
"Stationary_Satellite", "Controlled_Acoustic", "Controlled_Acoustic",
"Controlled_Acoustic", "Controlled_Acoustic", "BRUV_Acoustic",
"BRUV_Acoustic", "BRUV_Acoustic", "Animalborne_Satellite", "Animalborne_Satellite",
"Stationary_Archival", "Stationary_Archival", "Stationary_Archival",
"Stationary_Radio_PIT", "Stationary_Radio_PIT", "Controlled_Acoustic",
"Controlled_Acoustic", "Controlled_Acoustic", "Controlled_Acoustic",
"Controlled_Satellite", "Controlled_Satellite", "Controlled_Satellite",
"Controlled_Satellite", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival_PIT", "Animalborne_Archival_PIT", "Animalborne_Archival_PIT",
"Animalborne_Acoustic_Archival", "Animalborne_Acoustic_Archival",
"Animalborne_Acoustic_Archival", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Stationary_Acoustic_Archival",
"Stationary_Acoustic_Archival", "Stationary_Acoustic_Archival",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Stationary_Acoustic", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Stationary_Acoustic_Archival", "Stationary_Acoustic_Archival",
"Stationary_Acoustic_Archival", "Stationary_Acoustic_Archival",
"Animalborne_Acoustic", "Animalborne_Acoustic", "Animalborne_Acoustic",
"Animalborne_Archival", "Animalborne_Archival", "Stationary_Acoustic_PIT",
"Stationary_Acoustic_PIT", "Stationary_Acoustic_PIT", "BRUV_Acoustic",
"BRUV_Acoustic", "BRUV_Acoustic", "BRUV_Acoustic", "BRUV_Acoustic",
"BRUV_Acoustic", "Controlled_Archival", "Controlled_Archival",
"Controlled_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Stationary_Radio",
"Stationary_Acoustic_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Stationary_Acoustic_Archival", "Stationary_Acoustic_Archival",
"Stationary_Acoustic_Archival", "Controlled_Acoustic", "Controlled_Acoustic",
"Animalborne_Archival", "Animalborne_Archival", "Stationary_Acoustic",
"Stationary_Acoustic", "Animalborne_Satellite_Archival", "Animalborne_Satellite_Archival",
"Animalborne_Satellite_Archival", "Animalborne_Satellite_Archival",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Stationary_Satellite", "Stationary_Satellite",
"Stationary_Satellite", "Stationary_Satellite", "Stationary_Satellite",
"Animalborne_Archival", "Animalborne_Archival", "Stationary_Acoustic_Radio",
"Stationary_Acoustic_Radio", "Stationary_Acoustic_Radio", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Stationary_Acoustic", "Stationary_Acoustic", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "Animalborne_Archival", "Animalborne_Archival",
"Animalborne_Archival", "BRUV_Acoustic", "BRUV_Acoustic", "BRUV_Acoustic",
"BRUV_Acoustic", "BRUV_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Stationary_Acoustic", "Stationary_Acoustic",
"Stationary_Acoustic", "Animalborne_Satellite_Archival", "Animalborne_Satellite_Archival",
"Animalborne_Satellite_Archival", "Controlled_Acoustic", "Controlled_Acoustic",
"Controlled_Acoustic")), class = "data.frame", row.names = c(NA,
-245L))
I want a new column where the values relate to some strings so for every value that has a Acoustic
, radio
, or PIT
the value in the new column should be receiver based
and for everything else it should be non receiver based
. But for columns that have acoustic
and satellite
in the column I need the value in the new column to say Both
.
I have tried the ifelse
command using this code
df$Type = ifelse(df$Combination == "Acoustic", 'Non Receiver Based', 'Receiver Based')
But it's calling all of them receiver based
and I don't know how to incorporate all the arguments I am suggesting above.
答案1
得分: 1
使用grepl
来检查字符串中是否包含其中一个单词。模式由用|
分隔的单词组成,以表示检查字符串中是否出现这些单词中的任何一个:
min_string = c("BRUV_Acoustic_Satellite", "Stationary_Radio_PIT", "Animalborne_Satellite_Archival")
pattern = paste0(c("Acoustic", "radio", "PIT"), collapse = "|")
ifelse(!grepl(pattern, min_string), '非接收器',
ifelse(grepl("Acoustic", min_string) & grepl("Satellite", min_string), "两者",
"接收器"))
#[1] "两者" "接收器" "非接收器"
dplyr
的替代方法是使用case_when
,这可能更容易理解一些:
library(dplyr)
case_when(!grepl(pattern, min_string) ~ "非接收器",
grepl("Acoustic", min_string) & grepl("Satellite", min_string) ~ "两者",
grepl(pattern, min_string) ~ "接收器")
英文:
Use grepl
to check if one of the words appears in the string. The pattern consists of the words separated by |
, to say check if either of this words appears in the string:
min_string = c("BRUV_Acoustic_Satellite", "Stationary_Radio_PIT", "Animalborne_Satellite_Archival")
pattern = paste0(c("Acoustic", "radio", "PIT"), collapse = "|")
ifelse(!grepl(pattern, min_string), 'Non Receiver Based',
ifelse(grepl("Acoustic", min_string) & grepl("Satellite", min_string), "Both",
"Receiver Based"))
#[1] "Both" "Receiver Based" "Non Receiver Based"
A dplyr
alternative is to use case_when
, which might be a bit friendlier to understand:
library(dplyr)
case_when(!grepl(pattern, min_string) ~ "Non Receiver",
grepl("Acoustic", min_string) & grepl("Satellite", min_string) ~ "Both",
grepl(pattern, min_string) ~ "Receiver")
答案2
得分: 1
你可以使用 grepl()
来检查字符串中是否出现了特定模式,然后使用 if-else 语句来决定不同的情况。由于 if()
不支持矢量化操作,你需要将其包装在 Vectorize()
中,以便在 mutate()
中使用。
library(tidyverse)
match <- Vectorize(function(string) {
if (!grepl("Acoustic|radio|PIT", string)) {
"非接收器基础"
} else if (grepl("Acoustic", string) && grepl("Satellite", string)) {
"两者都有"
} else "接收器基础"
})
df %>% mutate(new_var = match(Combination))
英文:
You can use grepl()
to check if a certain pattern occurs in a string, and then use if-else statements to decide on your cases. Since if()
is not vectorized, you need to wrap it in Vectorize()
to use it in mutate()
.
library(tidyverse)
match <- Vectorize(function(string) {
if (!grepl("Acoustic|radio|PIT", string)) {
"non receiver based"
} else if ((grepl("Acoustic", string)) &
(grepl("Satellite", string))) {
"Both"
} else "receiver based"
})
df %>% mutate(new_var = match(Combination))
答案3
得分: 0
你可以使用tidyverse
中的一些函数。
下面,我创建一个名为new_col
的新列,这是您期望的输出:
library(tidyverse)
df %>%
mutate(new_col = if_else(str_detect(Combination, "Acoustic")==T | str_detect(Combination, "Radio")==T | str_detect(Combination, "PIT"), "receiver based", "non receiver based")) %>%
mutate(new_col = if_else(str_detect(Combination, "Acoustic")==T & str_detect(Combination, "Satellite")==T, "Both", new_col))
英文:
You can use some functions from tidyverse
.
Below, I create a new column called new_col
which is your desired output:
library(tidyverse)
df %>%
mutate(new_col = if_else(str_detect(Combination, "Acoustic")==T | str_detect(Combination, "Radio")==T | str_detect(Combination, "PIT"), "receiver based", "non receiver based")) %>%
mutate(new_col = if_else(str_detect(Combination, "Acoustic")==T & str_detect(Combination, "Satellite")==T, "Both", new_col))
答案4
得分: 0
这个翻译如下:
这样的内容对你有帮助吗?
df1 <- df %>% mutate(new_col = case_when(
str_detect(Combination, "Acoustic_Satellite") ~ "both",
str_detect(Combination, "Acoustic") ~ "Receiver Based",
str_detect(Combination, "Radio") ~ "Receiver Based",
str_detect(Combination, "PIT") ~ "Receiver Based"))
df2 <- replace(df1, is.na(df1), "non receiver based")
这只是一个比较粗糙的解决方案,我相信有人会找到更好的方法。只有在你的数据集中"Acoustic_Satellite"是一个单词时才有效。
英文:
would something like this be helpfull to you?
df1 <- df %>% mutate(new_col = case_when(
str_detect(Combination, "Acoustic_Satellite") ~ "both",
str_detect(Combination, "Acoustic") ~ "Receiver Based",
str_detect(Combination, "Radio") ~ "Receiver Based",
str_detect(Combination, "PIT") ~ "Receiver Based"))
df2 <- replace(df1, is.na(df1), "non receiver based")
It is a rather dirty solution, I am sure someone will find a better one and only works if Acoustic_Satelite is a single word in your dataset but it does the job.
答案5
得分: 0
一个简单的逐步方法可以是:
df$Type <- "非接收器型"
i <- grepl("Acoustic", df$Combination)
df$Type[i | grepl("radio|PIT", df$Combination)] <- "接收器型"
df$Type[i & grepl("Satellite", df$Combination)] <- "两者"
rm(i)
df
# Combination Type
#1 BRUV_Acoustic_Satellite 两者
#2 BRUV_Acoustic_Satellite 两者
#3 BRUV_Acoustic_Satellite 两者
#4 BRUV_Acoustic_Satellite 两者
#5 BRUV_Acoustic_Satellite 两者
#6 BRUV_Acoustic_Satellite 两者
#7 Animalborne_Archival 非接收器型
#8 Animalborne_Archival 非接收器型
#9 Animalborne_Archival 非接收器型
#10 Controlled_Acoustic 接收器型
#...
只是为了好玩的一个基准测试:
library(tidyverse)
match <- Vectorize(function(string) {
if (!grepl("Acoustic|radio|PIT", string)) {
"非接收器型"
} else if ((grepl("Acoustic", string)) &
(grepl("Satellite", string))) {
"两者"
} else "接收器型"
})
bench::mark(check = FALSE,
Maël1 = local({pattern = paste0(c("Acoustic", "radio", "PIT"), collapse = "|")
cbind(df, Type=ifelse(!grepl(pattern, df$Combination), '非接收器型',
ifelse(grepl("Acoustic", df$Combination) & grepl("Satellite", df$Combination), "两者",
"接收器型"))) }),
Maël2 = local({pattern = paste0(c("Acoustic", "radio", "PIT"), collapse = "|")
cbind(df, Type=case_when(!grepl(pattern, df$Combination) ~ "非接收器",
grepl("Acoustic", df$Combination) & grepl("Satellite", df$Combination) ~ "两者",
grepl(pattern, df$Combination) ~ "接收器")) }),
"Lukas Unterschuetz" = local({df %>% mutate(new_var = match(Combination))}),
Leonardo19 = local({df %>%
mutate(new_col = if_else(str_detect(Combination, "Acoustic")==T | str_detect(Combination, "Radio")==T | str_detect(Combination, "PIT"), "接收器型", "非接收器型")) %>%
mutate(new_col = if_else(str_detect(Combination, "Acoustic")==T & str_detect(Combination, "Satellite")==T, "两者", new_col)) }),
procerus = local({df1 <- df %>% mutate(new_col = case_when(
str_detect(Combination, "Acoustic_Satellite") ~ "两者",
str_detect(Combination, "Acoustic") ~ "接收器型",
str_detect(Combination, "Radio") ~ "接收器型",
str_detect(Combination, "PIT") ~ "接收器型"))
replace(df1, is.na(df1), "非接收器型")}),
GKi = local({df$Type <- "非接收器型"
i <- grepl("Acoustic", df$Combination, fixed=TRUE)
df$Type[i | grepl("radio|PIT", df$Combination)] <- "接收器型"
df$Type[i & grepl("Satellite", df$Combination, fixed=TRUE)] <- "两者"
rm(i)
df})
)
结果:
expression 最小时间 中位时间 `每秒迭代次数` 内存分配量 `每秒垃圾回收次数` 迭代次数 垃圾回收次数
1 Maël1 531.63微秒 562.63微秒 1751. 31.1KB 2.02 865 1
2 Maël2 776.25微秒 820.19微秒 1204. 44.4KB 2.02 596 1
3 Lukas Unterschuetz 5.52毫秒 5.76毫秒 172. 11.8KB 4.13 83 2
4 Leonardo19 2.91毫秒 3.07毫秒 307. 50.6KB 6.26 147 3
5 procerus 1.89毫秒 2.02毫秒 457. 66.9KB 6.24 220 3
6 GKi 214.9微秒 231.51微秒 4241. 11.8KB 2.02 2099 1
英文:
A simple step by step way could be:
df$Type <- "non receiver based"
i <- grepl("Acoustic", df$Combination)
df$Type[i | grepl("radio|PIT", df$Combination)] <- "receiver based"
df$Type[i & grepl("Satellite", df$Combination)] <- "Both"
rm(i)
df
# Combination Type
#1 BRUV_Acoustic_Satellite Both
#2 BRUV_Acoustic_Satellite Both
#3 BRUV_Acoustic_Satellite Both
#4 BRUV_Acoustic_Satellite Both
#5 BRUV_Acoustic_Satellite Both
#6 BRUV_Acoustic_Satellite Both
#7 Animalborne_Archival non receiver based
#8 Animalborne_Archival non receiver based
#9 Animalborne_Archival non receiver based
#10 Controlled_Acoustic receiver based
#...
Just for fun a Benchmark:
library(tidyverse)
match <- Vectorize(function(string) { #Maybe another name would be better
if (!grepl("Acoustic|radio|PIT", string)) {
"non receiver based"
} else if ((grepl("Acoustic", string)) &
(grepl("Satellite", string))) {
"Both"
} else "receiver based"
})
bench::mark(check = FALSE,
Maël1 = local({pattern = paste0(c("Acoustic", "radio", "PIT"), collapse = "|")
cbind(df, Type=ifelse(!grepl(pattern, df$Combination), 'Non Receiver Based',
ifelse(grepl("Acoustic", df$Combination) & grepl("Satellite", df$Combination), "Both",
"Receiver Based"))) }),
Maël2 = local({pattern = paste0(c("Acoustic", "radio", "PIT"), collapse = "|")
cbind(df, Type=case_when(!grepl(pattern, df$Combination) ~ "Non Receiver",
grepl("Acoustic", df$Combination) & grepl("Satellite", df$Combination) ~ "Both",
grepl(pattern, df$Combination) ~ "Receiver")) }),
"Lukas Unterschuetz" = local({df %>% mutate(new_var = match(Combination))}),
Leonardo19 = local({df %>%
mutate(new_col = if_else(str_detect(Combination, "Acoustic")==T | str_detect(Combination, "Radio")==T | str_detect(Combination, "PIT"), "receiver based", "non receiver based")) %>%
mutate(new_col = if_else(str_detect(Combination, "Acoustic")==T & str_detect(Combination, "Satellite")==T, "Both", new_col)) }),
procerus = local({df1 <- df %>% mutate(new_col = case_when(
str_detect(Combination, "Acoustic_Satellite") ~ "both",
str_detect(Combination, "Acoustic") ~ "Receiver Based",
str_detect(Combination, "Radio") ~ "Receiver Based",
str_detect(Combination, "PIT") ~ "Receiver Based"))
replace(df1, is.na(df1), "non receiver based")}),
GKi = local({df$Type <- "non receiver based"
i <- grepl("Acoustic", df$Combination, fixed=TRUE)
df$Type[i | grepl("radio|PIT", df$Combination)] <- "receiver based"
df$Type[i & grepl("Satellite", df$Combination, fixed=TRUE)] <- "Both"
rm(i)
df})
)
Result
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
1 Maël1 531.63µs 562.63µs 1751. 31.1KB 2.02 865 1
2 Maël2 776.25µs 820.19µs 1204. 44.4KB 2.02 596 1
3 Lukas Unterschuetz 5.52ms 5.76ms 172. 11.8KB 4.13 83 2
4 Leonardo19 2.91ms 3.07ms 307. 50.6KB 6.26 147 3
5 procerus 1.89ms 2.02ms 457. 66.9KB 6.24 220 3
6 GKi 214.9µs 231.51µs 4241. 11.8KB 2.02 2099 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论