如何从URL中提取域名?

huangapple go评论139阅读模式
英文:

How do I retrieve the domain from a URL?

问题

在Go语言中,你可以使用正则表达式来从URL字符串中提取域名。以下是一个示例代码:

  1. package main
  2. import (
  3. "fmt"
  4. "regexp"
  5. )
  6. func extractDomain(url string) (string, error) {
  7. // 定义正则表达式
  8. regex := regexp.MustCompile(`^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)`)
  9. // 使用正则表达式提取域名
  10. matches := regex.FindStringSubmatch(url)
  11. if len(matches) < 2 {
  12. return "", fmt.Errorf("Failed to extract domain from URL")
  13. }
  14. return matches[1], nil
  15. }
  16. func main() {
  17. urls := []string{
  18. "https://www.example.com/some-random-url",
  19. "www.example.com/some-random-url",
  20. "example.com/some-random-url",
  21. "www.example.com",
  22. "subdomain.example.com",
  23. }
  24. for _, url := range urls {
  25. domain, err := extractDomain(url)
  26. if err != nil {
  27. fmt.Printf("Error extracting domain from URL: %v\n", err)
  28. continue
  29. }
  30. fmt.Println(domain)
  31. }
  32. }

这段代码使用了正则表达式来匹配URL字符串中的域名部分。它首先定义了一个正则表达式模式,然后使用FindStringSubmatch函数来提取匹配的域名部分。最后,通过循环遍历URL列表,调用extractDomain函数来提取域名并打印输出。

请注意,这段代码使用了Go语言的标准库中的regexp包来处理正则表达式。

英文:

In Go, how can I extract only the domain name from a URL string?

Before:

  1. https://www.example.com/some-random-url
  2. www.example.com/some-random-url
  3. example.com/some-random-url
  4. www.example.com
  5. subdomain.example.com

After:

  1. example.com

Also, I'm limited to using the Golang standard library.

答案1

得分: 1

我终于弄清楚了。

  1. package main
  2. import (
  3. "fmt"
  4. "log"
  5. "net/url"
  6. "strings"
  7. )
  8. func main() {
  9. url, err := url.Parse("https://www.example.com")
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. parts := strings.Split(url.Hostname(), ".")
  14. domain := parts[len(parts)-2] + "." + parts[len(parts)-1]
  15. fmt.Println(domain)
  16. }

example.com

如果域名是像 subdomain.example.com 这样的,它会导致程序崩溃。

https://play.golang.org/p/Li0PviAr2jU

英文:

I've finally figured it out.

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;log&quot;
  5. &quot;net/url&quot;
  6. &quot;strings&quot;
  7. )
  8. func main() {
  9. url, err := url.Parse(&quot;https://www.example.com&quot;)
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. parts := strings.Split(url.Hostname(), &quot;.&quot;)
  14. domain := parts[len(parts)-2] + &quot;.&quot; + parts[len(parts)-1]
  15. fmt.Println(domain)
  16. }

example.com

If the domain is something like subdomain.example.com than it will give you a panic.

https://play.golang.org/p/Li0PviAr2jU

答案2

得分: 1

我认为,由于你的示例中也有错误的URL,所以你需要使用正则表达式来提取URL中的域名。请参考下面的示例代码,以获取你分享的示例的域名:

  1. package main
  2. import (
  3. "fmt"
  4. "regexp"
  5. )
  6. // 主函数
  7. func main() {
  8. // 从给定字符串中查找正则表达式
  9. // 使用FindString()方法
  10. m := regexp.MustCompile(`\.?([^.]*.com)`)
  11. fmt.Println(m.FindStringSubmatch("https://www.example.com/some-random-url")[1])
  12. fmt.Println(m.FindStringSubmatch("www.example.com/some-random-url")[1])
  13. fmt.Println(m.FindStringSubmatch("example.com/some-random-url")[1])
  14. fmt.Println(m.FindStringSubmatch("www.example.com")[1])
  15. fmt.Println(m.FindStringSubmatch("subdomain.example.com")[1])
  16. }

理想情况下,这涵盖了所有情况(包括格式不正确的URL)。如果有任何无法正确解析的URL,你可以轻松更新正则表达式。

以上是上述代码的Go Playground链接:这里

英文:

I think since your examples has incorrect URLs as well, you need to use Regular Expresssion to extract the domain in the URL. Please find the sample code below to get the domain for the examples you shared:

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;regexp&quot;
  5. )
  6. // Main function
  7. func main() {
  8. // Finding regexp from the given string
  9. // Using FindString() method
  10. m := regexp.MustCompile(`\.?([^.]*.com)`)
  11. fmt.Println(m.FindStringSubmatch(&quot;https://www.example.com/some-random-url&quot;)[1])
  12. fmt.Println(m.FindStringSubmatch(&quot;www.example.com/some-random-url&quot;)[1])
  13. fmt.Println(m.FindStringSubmatch(&quot;example.com/some-random-url&quot;)[1])
  14. fmt.Println(m.FindStringSubmatch(&quot;www.example.com&quot;)[1])
  15. fmt.Println(m.FindStringSubmatch(&quot;subdomain.example.com&quot;)[1])
  16. }

Ideally, this covers all the cases (including incorrectly formed URLs). You can easily update RegEx if there is any URL that doesn't get parsed correctly.

Go Playground link for the above: here.

答案3

得分: 0

这个解决方案将会把以下内容转换为:

  1. " ",
  2. "aaa",
  3. "not domain",
  4. "ca.mail.google.com",
  5. "google.com",
  6. " google.com ",
  7. " www.google.com/a/example.com",
  8. "www.google.com/f/example.com",
  9. "google.com/f/example.com",
  10. "http://google.com/f/abc.com",
  11. "http://google.com/f/?wow=xyz.com",
  12. "http://google.com/f/?wow=www.xyz.com",
  13. "http://www.google.com/f/abc.com",
  14. "https://www.google.com/f/abc.com",
  15. "https://mail.google.com/f/abc.com",
  16. "https://123.google.com/f/abc.com",
  17. "https://xn-ddf3.google.com/f/abc.com",

转换为:

  1. [空字符串]
  2. [空字符串]
  3. [空字符串]
  4. ca.mail.google.com
  5. google.com
  6. google.com
  7. google.com
  8. google.com
  9. google.com
  10. google.com
  11. google.com
  12. google.com
  13. google.com
  14. google.com
  15. google.com
  16. mail.google.com
  17. 123.google.com
  18. xn-ddf3.google.com

"net/url" 方法 url.Parse 无法处理类似于 bla bla google.com 的域名字符串。

英文:

This solution

  1. func extractDomain(urlLikeString string) string {
  2. urlLikeString = strings.TrimSpace(urlLikeString)
  3. if regexp.MustCompile(`^https?`).MatchString(urlLikeString) {
  4. read, _ := url.Parse(urlLikeString)
  5. urlLikeString = read.Host
  6. }
  7. if regexp.MustCompile(`^www\.`).MatchString(urlLikeString) {
  8. urlLikeString = regexp.MustCompile(`^www\.`).ReplaceAllString(urlLikeString, &quot;&quot;)
  9. }
  10. return regexp.MustCompile(`([a-z0-9\-]+\.)+[a-z0-9\-]+`).FindString(urlLikeString)
  11. }

will turn this

  1. &quot; &quot;,
  2. &quot;aaa&quot;,
  3. &quot;not domain&quot;,
  4. &quot;ca.mail.google.com&quot;,
  5. &quot;google.com&quot;,
  6. &quot; google.com &quot;,
  7. &quot; www.google.com/a/example.com&quot;,
  8. &quot;www.google.com/f/example.com&quot;,
  9. &quot;google.com/f/example.com&quot;,
  10. &quot;http://google.com/f/abc.com&quot;,
  11. &quot;http://google.com/f/?wow=xyz.com&quot;,
  12. &quot;http://google.com/f/?wow=www.xyz.com&quot;,
  13. &quot;http://www.google.com/f/abc.com&quot;,
  14. &quot;https://www.google.com/f/abc.com&quot;,
  15. &quot;https://mail.google.com/f/abc.com&quot;,
  16. &quot;https://123.google.com/f/abc.com&quot;,
  17. &quot;https://xn-ddf3.google.com/f/abc.com&quot;,

into this

  1. [empty string]
  2. [empty string]
  3. [empty string]
  4. ca.mail.google.com
  5. google.com
  6. google.com
  7. google.com
  8. google.com
  9. google.com
  10. google.com
  11. google.com
  12. google.com
  13. google.com
  14. google.com
  15. mail.google.com
  16. 123.google.com
  17. xn-ddf3.google.com

"net/url" method url.Parse won't handle domain-like string, like: bla bla google.com.

答案4

得分: -1

我认为这可能会有帮助。

  1. package main
  2. import (
  3. "fmt"
  4. "log"
  5. "net/url"
  6. "strings"
  7. )
  8. func main() {
  9. strArray := []string{
  10. "www.google.co.in",
  11. "https://google.in",
  12. "instagram.com",
  13. "nymag.com",
  14. "http://www.example.com/?airport=approval&amp;box=brother",
  15. "https://www.example.com/babies.php#birds",
  16. "http://example.org/bear",
  17. "www.google.co.in",
  18. "google.com",
  19. "https://www.bbb.org/search/business-review-form/",
  20. "https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/",
  21. "http://www.example.com/boat/advertisement?actor=bat#boundary",
  22. "https://www.example.com/",
  23. "https://www.google.com",
  24. "https://www.example.com/army/approval.htm?basket=bottle",
  25. "http://example.com/board.aspx?afternoon=appliance&amp;angle=ball",
  26. "http://www.example.com/",
  27. "http://example.com/",
  28. "http://www.example.com/",
  29. "livejournal.com",
  30. "delicious.com",
  31. "illinois.edu",
  32. "instagram.com",
  33. "nymag.com",
  34. "altervista.org",
  35. "t.co",
  36. "reddit.com",
  37. "tinyurl.com",
  38. }
  39. var hostname string
  40. var temp []string
  41. for i := 0; i < len(strArray); i++ {
  42. url, err := url.Parse(strArray[i])
  43. if err != nil {
  44. log.Fatal(err)
  45. }
  46. var urlstr string = url.String()
  47. // 这里将过滤前缀和主机名
  48. if strings.HasPrefix(urlstr, "https") {
  49. hostname = strings.TrimPrefix(urlstr, "https://")
  50. } else if strings.HasPrefix(urlstr, "http") {
  51. hostname = strings.TrimPrefix(urlstr, "http://")
  52. } else {
  53. hostname = urlstr
  54. }
  55. if strings.HasPrefix(hostname, "www") {
  56. hostname = strings.TrimPrefix(hostname, "www.")
  57. }
  58. if strings.Contains(hostname, "/") {
  59. temp = strings.Split(hostname, "/")
  60. fmt.Println(temp[0])
  61. } else {
  62. fmt.Println(hostname)
  63. }
  64. }
  65. }

输出:

  1. google.co.in
  2. google.in
  3. instagram.com
  4. nymag.com
  5. example.com
  6. example.com
  7. example.org
  8. google.co.in
  9. google.com
  10. bbb.org
  11. localvisibilitysystem.com
  12. example.com
  13. example.com
  14. google.com
  15. example.com
  16. example.com
  17. example.com
  18. example.com
  19. example.com
  20. livejournal.com
  21. delicious.com
  22. illinois.edu
  23. instagram.com
  24. nymag.com
  25. altervista.org
  26. t.co
  27. reddit.com
  28. tinyurl.com
  29. 这将从任何URL中提取所需的域名。
  30. 以上是Go Playground的链接:
  31. https://go.dev/play/p/vfCOAnTNqh8
英文:

I think this can be helpful

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;log&quot;
  5. &quot;net/url&quot;
  6. &quot;strings&quot;
  7. )
  8. func main() {
  9. strArray := []string{
  10. &quot;www.google.co.in&quot;,
  11. &quot;https://google.in&quot;,
  12. &quot;instagram.com&quot;,
  13. &quot;nymag.com&quot;,
  14. &quot;http://www.example.com/?airport=approval&amp;box=brother&quot;,
  15. &quot;https://www.example.com/babies.php#birds&quot;,
  16. &quot;http://example.org/bear&quot;,
  17. &quot;www.google.co.in&quot;,
  18. &quot;google.com&quot;,
  19. &quot;https://www.bbb.org/search/business-review-form/&quot;,
  20. &quot;https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/&quot;,
  21. &quot;http://www.example.com/boat/advertisement?actor=bat#boundary&quot;,
  22. &quot;https://www.example.com/&quot;,
  23. &quot;https://www.google.com&quot;,
  24. &quot;https://www.example.com/army/approval.htm?basket=bottle&quot;,
  25. &quot;http://example.com/board.aspx?afternoon=appliance&amp;angle=ball&quot;,
  26. &quot;http://www.example.com/&quot;,
  27. &quot;http://example.com/&quot;,
  28. &quot;http://www.example.com/&quot;,
  29. &quot;livejournal.com&quot;,
  30. &quot;delicious.com&quot;,
  31. &quot;illinois.edu&quot;,
  32. &quot;instagram.com&quot;,
  33. &quot;nymag.com&quot;,
  34. &quot;altervista.org&quot;,
  35. &quot;t.co&quot;,
  36. &quot;reddit.com&quot;,
  37. &quot;tinyurl.com&quot;,
  38. }
  39. var hostname string
  40. var temp []string
  41. for i := 0; i &lt; len(strArray); i++ {
  42. url, err := url.Parse(strArray[i])
  43. if err != nil {
  44. log.Fatal(err)
  45. }
  46. var urlstr string = url.String()

here prefix and host name will be filtered

  1. if strings.HasPrefix(urlstr, &quot;https&quot;) {
  2. hostname = strings.TrimPrefix(urlstr, &quot;https://&quot;)
  3. } else if strings.HasPrefix(urlstr, &quot;http&quot;) {
  4. hostname = strings.TrimPrefix(urlstr, &quot;http://&quot;)
  5. } else {
  6. hostname = urlstr
  7. }
  8. if strings.HasPrefix(hostname, &quot;www&quot;) {
  9. hostname = strings.TrimPrefix(hostname, &quot;www.&quot;)
  10. }
  11. if strings.Contains(hostname, &quot;/&quot;) {
  12. temp = strings.Split(hostname, &quot;/&quot;)
  13. fmt.Println(temp[0])
  14. } else {
  15. fmt.Println(hostname)
  16. }
  17. }
  18. }

output:

  1. google.co.in
  2. google.in
  3. instagram.com
  4. nymag.com
  5. example.com
  6. example.com
  7. example.org
  8. google.co.in
  9. google.com
  10. bbb.org
  11. localvisibilitysystem.com
  12. example.com
  13. example.com
  14. google.com
  15. example.com
  16. example.com
  17. example.com
  18. example.com
  19. example.com
  20. livejournal.com
  21. delicious.com
  22. illinois.edu
  23. instagram.com
  24. nymag.com
  25. altervista.org
  26. t.co
  27. reddit.com
  28. tinyurl.com

this will give you required Domain from any url
Go Playground link for the above:
https://go.dev/play/p/vfCOAnTNqh8

huangapple
  • 本文由 发表于 2021年5月22日 22:35:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/67650694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定