正则表达式: 允许白名单 URL 顶级域名

huangapple go评论50阅读模式
英文:

Regex: Allowlisting URL Top-Level-Domains

问题

以下是要翻译的代码部分:

Regex

^https:\/\/[^\/]+\.my-site\.com|cloud\/?.*?$

False Positive

https://qa.my-site.cloud.cloud/check-this-out?check-it-out=true

这是代码的英文翻译,没有其他内容。

英文:

I'm looking for a regex that allowlists specific TLDs in the URL scheme. Most of my tests are passing except for ones that repeat the TLD:

Regex

^https:\/\/[^\/]+\.my-site\.com|cloud\/?.*?$

False Positive

https://qa.my-site.cloud.cloud/check-this-out?check-it-out=true

This is showing as a valid match according to the regex, but I want to exclude matches that have repeated TLDs (i.e. .cloud.cloud from the example above). How do I avoid the regex matching URLs with repeated TLDs? Adding a group and {1} did not solve the problem: ^https:\/\/[^\/]+\.my-site\.(?:com|cloud){1}\/?.*?$

Language being used is Javascript.

答案1

得分: 1

以下是翻译好的部分:

  1. You need to wrap alternation (|) in parentheses: (com|cloud) instead of com|cloud
    你需要用括号括起来进行交替 (|):使用 (com|cloud) 而不是 com|cloud

  2. The / should be required if there is a path to the URL
    如果URL中有路径,/ 应该是必需的

Here's a working regex:
这是一个有效的正则表达式:

^https:\/\/[^\/]+\.my-site\.(com|cloud)(\/.*?)?$

英文:

There are two issues:

  1. You need to wrap alternation (|) in parentheses: (com|cloud) instead of com|cloud
  2. The / should be required if there is a path to the URL

Here's a working regex:

^https:\/\/[^\/]+\.my-site\.(com|cloud)(\/.*?)?$

Try it.

答案2

得分: 1

你可以使用正向预查断言来确保顶级域名 (TLD) 仅在域名中出现一次:

^https:\/\/[^\/]+\.my-site\.(?=com|cloud)[^\/]*\/?.*?$ 

解释:

  • ^https:\/\/[^\/]+\.my-site\. 匹配字符串的开头,"https://" 以及 ".my-site." 之前的所有内容。
  • (?=com|cloud) 是一个正向预查断言,确保 TLD 要么是 "com",要么是 "cloud"。
  • [^\/]*\/? 匹配域名路径的其余部分,包括 TLD,但不包括可能存在的尾随斜杠。
  • .*?$ 匹配 URL 的其余部分,包括任何查询参数。

请注意,TLDs 列在一个非捕获组中 (?:com|cloud),这意味着它们将被视为一个选项,不会捕获任何匹配的组。

英文:

You can use a positive lookahead assertion to ensure that the TLD only appears once in the domain:

^https:\/\/[^\/]+\.my-site\.(?=com|cloud)[^\/]*\/?.*?$

Explanation:

  • ^https:\/\/[^\/]+\.my-site\. matches the start of the string, "https://" and everything up to ".my-site."
  • (?=com|cloud) is a positive lookahead assertion that ensures the TLD is either "com" or "cloud"
  • [^\/]*\/? matches the rest of the domain path, including the TLD, but not including any trailing slash (if there is one)
  • .*?$ matches the rest of the URL, including any query parameters

Note that the TLDs are listed inside a non-capturing group (?:com|cloud), which means they will be treated as one option and not capture any matching groups.

huangapple
  • 本文由 发表于 2023年2月24日 03:05:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75549281.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定