英文:
Attempting to change URL suffixes
问题
修改后的函数:
modify_suffix <- function(url) {
suffix <- str_extract(url, "\\.\\w+$")
if (!suffix %in% c(".net", ".co", ".org", ".news", ".tv")) {
url <- str_replace(url, "\\.\\w+$", ".com")
}
return(url)
}
domo_data$domain <- sapply(domo_data$domain, modify_suffix)
英文:
I want to change a URL suffix to .com
unless the existing suffix is .net
, .co
, .org
, .news
, or .tv
. The function that I'm using changes all URL suffixes to .com
. How should I modify it?
modify_suffix <- function(url) {
suffix <- str_extract(url, "\\.\\w+$")
if (!suffix %in% c(".net", ".co", ".org", ".news", ".tv")) {
url <- str_replace(url, "\\.\\w+$", ".com")
}
return(url)
}
domo_data$domain <- sapply(domo_data$domain, modify_suffix)
答案1
得分: 3
有一些细微之处和陷阱在使用正则表达式替换后缀时(例如,一些常见后缀本身可能包含句点)。
已经有可靠且快速完成此任务的现成工具。例如,如果你安装了 urltools
包,你可以这样做:
urls <- c('https://www.example.com',
'https://www.example.co.uk',
'https://www.example.net')
urls <- urltools::suffix_extract(urls)
urls
#> host subdomain domain suffix
#> 1 https://www.example.com https://www example com
#> 2 https://www.example.co.uk https://www example co.uk
#> 3 https://www.example.net https://www example net
这样就可以轻松可靠地更改后缀:
urls$suffix[!urls$suffix %in% c("net", "co", "org", "news", "tv")] <- 'com'
urls <- with(urls, paste(subdomain, domain, suffix, sep = '.'))
urls
#> [1] "https://www.example.com" "https://www.example.com"
#> [3] "https://www.example.net"
英文:
There are a few subtleties and gotchas in replacing the suffix using regex (some common suffixes can themselves contain periods, for example).
There are tools already available that can do the job reliably and quickly. For example, if you install the urltools
package, you can do
urls <- c('https://www.example.com',
'https://www.example.co.uk',
'https://www.example.net')
urls <- urltools::suffix_extract(urls)
urls
#> host subdomain domain suffix
#> 1 https://www.example.com https://www example com
#> 2 https://www.example.co.uk https://www example co.uk
#> 3 https://www.example.net https://www example net
This then makes changing suffixes easy and reliable:
urls$suffix[!urls$suffix %in% c("net", "co", "org", "news", "tv")] <- 'com'
urls <- with(urls, paste(subdomain, domain, suffix, sep = '.'))
urls
#> [1] "https://www.example.com" "https://www.example.com"
#> [3] "https://www.example.net"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论