尝试更改URL后缀

huangapple go评论62阅读模式
英文:

Attempting to change URL suffixes

问题

修改后的函数:

modify_suffix <- function(url) {
  suffix <- str_extract(url, "\\.\\w+$")
  if (!suffix %in% c(".net", ".co", ".org", ".news", ".tv")) {
    url <- str_replace(url, "\\.\\w+$", ".com")
  }
  return(url)
}
domo_data$domain <- sapply(domo_data$domain, modify_suffix)
英文:

I want to change a URL suffix to .com unless the existing suffix is .net, .co, .org, .news, or .tv. The function that I'm using changes all URL suffixes to .com. How should I modify it?

modify_suffix &lt;- function(url) {
  suffix &lt;- str_extract(url, &quot;\\.\\w+$&quot;)
  if (!suffix %in% c(&quot;.net&quot;, &quot;.co&quot;, &quot;.org&quot;, &quot;.news&quot;, &quot;.tv&quot;)) {
    url &lt;- str_replace(url, &quot;\\.\\w+$&quot;, &quot;.com&quot;)
  }
  return(url)
}
domo_data$domain &lt;- sapply(domo_data$domain, modify_suffix)

答案1

得分: 3

有一些细微之处和陷阱在使用正则表达式替换后缀时(例如,一些常见后缀本身可能包含句点)。

已经有可靠且快速完成此任务的现成工具。例如,如果你安装了 urltools 包,你可以这样做:

urls <- c('https://www.example.com',
          'https://www.example.co.uk',
          'https://www.example.net')

urls <- urltools::suffix_extract(urls)

urls
#>                        host   subdomain  domain suffix
#> 1   https://www.example.com https://www example    com
#> 2 https://www.example.co.uk https://www example  co.uk
#> 3   https://www.example.net https://www example    net

这样就可以轻松可靠地更改后缀:

urls$suffix[!urls$suffix %in% c("net", "co", "org", "news", "tv")] <- 'com'
urls <- with(urls, paste(subdomain, domain, suffix, sep = '.'))
             
urls
#> [1] "https://www.example.com" "https://www.example.com"
#> [3] "https://www.example.net"
英文:

There are a few subtleties and gotchas in replacing the suffix using regex (some common suffixes can themselves contain periods, for example).

There are tools already available that can do the job reliably and quickly. For example, if you install the urltools package, you can do

urls &lt;- c(&#39;https://www.example.com&#39;,
          &#39;https://www.example.co.uk&#39;,
          &#39;https://www.example.net&#39;)

urls &lt;- urltools::suffix_extract(urls)

urls
#&gt;                        host   subdomain  domain suffix
#&gt; 1   https://www.example.com https://www example    com
#&gt; 2 https://www.example.co.uk https://www example  co.uk
#&gt; 3   https://www.example.net https://www example    net

This then makes changing suffixes easy and reliable:

urls$suffix[!urls$suffix %in% c(&quot;net&quot;, &quot;co&quot;, &quot;org&quot;, &quot;news&quot;, &quot;tv&quot;)] &lt;- &#39;com&#39;
urls &lt;- with(urls, paste(subdomain, domain, suffix, sep = &#39;.&#39;))
             
urls
#&gt; [1] &quot;https://www.example.com&quot; &quot;https://www.example.com&quot;
#&gt; [3] &quot;https://www.example.net&quot; 

huangapple
  • 本文由 发表于 2023年3月7日 22:50:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75663511.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定