
huangapple go评论139阅读模式

How do I retrieve the domain from a URL?



  1. package main
  2. import (
  3. "fmt"
  4. "regexp"
  5. )
  6. func extractDomain(url string) (string, error) {
  7. // 定义正则表达式
  8. regex := regexp.MustCompile(`^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)`)
  9. // 使用正则表达式提取域名
  10. matches := regex.FindStringSubmatch(url)
  11. if len(matches) < 2 {
  12. return "", fmt.Errorf("Failed to extract domain from URL")
  13. }
  14. return matches[1], nil
  15. }
  16. func main() {
  17. urls := []string{
  18. "https://www.example.com/some-random-url",
  19. "www.example.com/some-random-url",
  20. "example.com/some-random-url",
  21. "www.example.com",
  22. "subdomain.example.com",
  23. }
  24. for _, url := range urls {
  25. domain, err := extractDomain(url)
  26. if err != nil {
  27. fmt.Printf("Error extracting domain from URL: %v\n", err)
  28. continue
  29. }
  30. fmt.Println(domain)
  31. }
  32. }




In Go, how can I extract only the domain name from a URL string?


  1. https://www.example.com/some-random-url
  2. www.example.com/some-random-url
  3. example.com/some-random-url
  4. www.example.com
  5. subdomain.example.com


  1. example.com

Also, I'm limited to using the Golang standard library.


得分: 1


  1. package main
  2. import (
  3. "fmt"
  4. "log"
  5. "net/url"
  6. "strings"
  7. )
  8. func main() {
  9. url, err := url.Parse("https://www.example.com")
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. parts := strings.Split(url.Hostname(), ".")
  14. domain := parts[len(parts)-2] + "." + parts[len(parts)-1]
  15. fmt.Println(domain)
  16. }


如果域名是像 subdomain.example.com 这样的,它会导致程序崩溃。



I've finally figured it out.

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;log&quot;
  5. &quot;net/url&quot;
  6. &quot;strings&quot;
  7. )
  8. func main() {
  9. url, err := url.Parse(&quot;https://www.example.com&quot;)
  10. if err != nil {
  11. log.Fatal(err)
  12. }
  13. parts := strings.Split(url.Hostname(), &quot;.&quot;)
  14. domain := parts[len(parts)-2] + &quot;.&quot; + parts[len(parts)-1]
  15. fmt.Println(domain)
  16. }


If the domain is something like subdomain.example.com than it will give you a panic.



得分: 1


  1. package main
  2. import (
  3. "fmt"
  4. "regexp"
  5. )
  6. // 主函数
  7. func main() {
  8. // 从给定字符串中查找正则表达式
  9. // 使用FindString()方法
  10. m := regexp.MustCompile(`\.?([^.]*.com)`)
  11. fmt.Println(m.FindStringSubmatch("https://www.example.com/some-random-url")[1])
  12. fmt.Println(m.FindStringSubmatch("www.example.com/some-random-url")[1])
  13. fmt.Println(m.FindStringSubmatch("example.com/some-random-url")[1])
  14. fmt.Println(m.FindStringSubmatch("www.example.com")[1])
  15. fmt.Println(m.FindStringSubmatch("subdomain.example.com")[1])
  16. }


以上是上述代码的Go Playground链接:这里


I think since your examples has incorrect URLs as well, you need to use Regular Expresssion to extract the domain in the URL. Please find the sample code below to get the domain for the examples you shared:

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;regexp&quot;
  5. )
  6. // Main function
  7. func main() {
  8. // Finding regexp from the given string
  9. // Using FindString() method
  10. m := regexp.MustCompile(`\.?([^.]*.com)`)
  11. fmt.Println(m.FindStringSubmatch(&quot;https://www.example.com/some-random-url&quot;)[1])
  12. fmt.Println(m.FindStringSubmatch(&quot;www.example.com/some-random-url&quot;)[1])
  13. fmt.Println(m.FindStringSubmatch(&quot;example.com/some-random-url&quot;)[1])
  14. fmt.Println(m.FindStringSubmatch(&quot;www.example.com&quot;)[1])
  15. fmt.Println(m.FindStringSubmatch(&quot;subdomain.example.com&quot;)[1])
  16. }

Ideally, this covers all the cases (including incorrectly formed URLs). You can easily update RegEx if there is any URL that doesn't get parsed correctly.

Go Playground link for the above: here.


得分: 0


  1. " ",
  2. "aaa",
  3. "not domain",
  4. "ca.mail.google.com",
  5. "google.com",
  6. " google.com ",
  7. " www.google.com/a/example.com",
  8. "www.google.com/f/example.com",
  9. "google.com/f/example.com",
  10. "http://google.com/f/abc.com",
  11. "http://google.com/f/?wow=xyz.com",
  12. "http://google.com/f/?wow=www.xyz.com",
  13. "http://www.google.com/f/abc.com",
  14. "https://www.google.com/f/abc.com",
  15. "https://mail.google.com/f/abc.com",
  16. "https://123.google.com/f/abc.com",
  17. "https://xn-ddf3.google.com/f/abc.com",


  1. [空字符串]
  2. [空字符串]
  3. [空字符串]
  4. ca.mail.google.com
  5. google.com
  6. google.com
  7. google.com
  8. google.com
  9. google.com
  10. google.com
  11. google.com
  12. google.com
  13. google.com
  14. google.com
  15. google.com
  16. mail.google.com
  17. 123.google.com
  18. xn-ddf3.google.com

"net/url" 方法 url.Parse 无法处理类似于 bla bla google.com 的域名字符串。


This solution

  1. func extractDomain(urlLikeString string) string {
  2. urlLikeString = strings.TrimSpace(urlLikeString)
  3. if regexp.MustCompile(`^https?`).MatchString(urlLikeString) {
  4. read, _ := url.Parse(urlLikeString)
  5. urlLikeString = read.Host
  6. }
  7. if regexp.MustCompile(`^www\.`).MatchString(urlLikeString) {
  8. urlLikeString = regexp.MustCompile(`^www\.`).ReplaceAllString(urlLikeString, &quot;&quot;)
  9. }
  10. return regexp.MustCompile(`([a-z0-9\-]+\.)+[a-z0-9\-]+`).FindString(urlLikeString)
  11. }

will turn this

  1. &quot; &quot;,
  2. &quot;aaa&quot;,
  3. &quot;not domain&quot;,
  4. &quot;ca.mail.google.com&quot;,
  5. &quot;google.com&quot;,
  6. &quot; google.com &quot;,
  7. &quot; www.google.com/a/example.com&quot;,
  8. &quot;www.google.com/f/example.com&quot;,
  9. &quot;google.com/f/example.com&quot;,
  10. &quot;http://google.com/f/abc.com&quot;,
  11. &quot;http://google.com/f/?wow=xyz.com&quot;,
  12. &quot;http://google.com/f/?wow=www.xyz.com&quot;,
  13. &quot;http://www.google.com/f/abc.com&quot;,
  14. &quot;https://www.google.com/f/abc.com&quot;,
  15. &quot;https://mail.google.com/f/abc.com&quot;,
  16. &quot;https://123.google.com/f/abc.com&quot;,
  17. &quot;https://xn-ddf3.google.com/f/abc.com&quot;,

into this

  1. [empty string]
  2. [empty string]
  3. [empty string]
  4. ca.mail.google.com
  5. google.com
  6. google.com
  7. google.com
  8. google.com
  9. google.com
  10. google.com
  11. google.com
  12. google.com
  13. google.com
  14. google.com
  15. mail.google.com
  16. 123.google.com
  17. xn-ddf3.google.com

"net/url" method url.Parse won't handle domain-like string, like: bla bla google.com.


得分: -1


  1. package main
  2. import (
  3. "fmt"
  4. "log"
  5. "net/url"
  6. "strings"
  7. )
  8. func main() {
  9. strArray := []string{
  10. "www.google.co.in",
  11. "https://google.in",
  12. "instagram.com",
  13. "nymag.com",
  14. "http://www.example.com/?airport=approval&amp;box=brother",
  15. "https://www.example.com/babies.php#birds",
  16. "http://example.org/bear",
  17. "www.google.co.in",
  18. "google.com",
  19. "https://www.bbb.org/search/business-review-form/",
  20. "https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/",
  21. "http://www.example.com/boat/advertisement?actor=bat#boundary",
  22. "https://www.example.com/",
  23. "https://www.google.com",
  24. "https://www.example.com/army/approval.htm?basket=bottle",
  25. "http://example.com/board.aspx?afternoon=appliance&amp;angle=ball",
  26. "http://www.example.com/",
  27. "http://example.com/",
  28. "http://www.example.com/",
  29. "livejournal.com",
  30. "delicious.com",
  31. "illinois.edu",
  32. "instagram.com",
  33. "nymag.com",
  34. "altervista.org",
  35. "t.co",
  36. "reddit.com",
  37. "tinyurl.com",
  38. }
  39. var hostname string
  40. var temp []string
  41. for i := 0; i < len(strArray); i++ {
  42. url, err := url.Parse(strArray[i])
  43. if err != nil {
  44. log.Fatal(err)
  45. }
  46. var urlstr string = url.String()
  47. // 这里将过滤前缀和主机名
  48. if strings.HasPrefix(urlstr, "https") {
  49. hostname = strings.TrimPrefix(urlstr, "https://")
  50. } else if strings.HasPrefix(urlstr, "http") {
  51. hostname = strings.TrimPrefix(urlstr, "http://")
  52. } else {
  53. hostname = urlstr
  54. }
  55. if strings.HasPrefix(hostname, "www") {
  56. hostname = strings.TrimPrefix(hostname, "www.")
  57. }
  58. if strings.Contains(hostname, "/") {
  59. temp = strings.Split(hostname, "/")
  60. fmt.Println(temp[0])
  61. } else {
  62. fmt.Println(hostname)
  63. }
  64. }
  65. }


  1. google.co.in
  2. google.in
  3. instagram.com
  4. nymag.com
  5. example.com
  6. example.com
  7. example.org
  8. google.co.in
  9. google.com
  10. bbb.org
  11. localvisibilitysystem.com
  12. example.com
  13. example.com
  14. google.com
  15. example.com
  16. example.com
  17. example.com
  18. example.com
  19. example.com
  20. livejournal.com
  21. delicious.com
  22. illinois.edu
  23. instagram.com
  24. nymag.com
  25. altervista.org
  26. t.co
  27. reddit.com
  28. tinyurl.com
  29. 这将从任何URL中提取所需的域名。
  30. 以上是Go Playground的链接:
  31. https://go.dev/play/p/vfCOAnTNqh8

I think this can be helpful

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;log&quot;
  5. &quot;net/url&quot;
  6. &quot;strings&quot;
  7. )
  8. func main() {
  9. strArray := []string{
  10. &quot;www.google.co.in&quot;,
  11. &quot;https://google.in&quot;,
  12. &quot;instagram.com&quot;,
  13. &quot;nymag.com&quot;,
  14. &quot;http://www.example.com/?airport=approval&amp;box=brother&quot;,
  15. &quot;https://www.example.com/babies.php#birds&quot;,
  16. &quot;http://example.org/bear&quot;,
  17. &quot;www.google.co.in&quot;,
  18. &quot;google.com&quot;,
  19. &quot;https://www.bbb.org/search/business-review-form/&quot;,
  20. &quot;https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/&quot;,
  21. &quot;http://www.example.com/boat/advertisement?actor=bat#boundary&quot;,
  22. &quot;https://www.example.com/&quot;,
  23. &quot;https://www.google.com&quot;,
  24. &quot;https://www.example.com/army/approval.htm?basket=bottle&quot;,
  25. &quot;http://example.com/board.aspx?afternoon=appliance&amp;angle=ball&quot;,
  26. &quot;http://www.example.com/&quot;,
  27. &quot;http://example.com/&quot;,
  28. &quot;http://www.example.com/&quot;,
  29. &quot;livejournal.com&quot;,
  30. &quot;delicious.com&quot;,
  31. &quot;illinois.edu&quot;,
  32. &quot;instagram.com&quot;,
  33. &quot;nymag.com&quot;,
  34. &quot;altervista.org&quot;,
  35. &quot;t.co&quot;,
  36. &quot;reddit.com&quot;,
  37. &quot;tinyurl.com&quot;,
  38. }
  39. var hostname string
  40. var temp []string
  41. for i := 0; i &lt; len(strArray); i++ {
  42. url, err := url.Parse(strArray[i])
  43. if err != nil {
  44. log.Fatal(err)
  45. }
  46. var urlstr string = url.String()

here prefix and host name will be filtered

  1. if strings.HasPrefix(urlstr, &quot;https&quot;) {
  2. hostname = strings.TrimPrefix(urlstr, &quot;https://&quot;)
  3. } else if strings.HasPrefix(urlstr, &quot;http&quot;) {
  4. hostname = strings.TrimPrefix(urlstr, &quot;http://&quot;)
  5. } else {
  6. hostname = urlstr
  7. }
  8. if strings.HasPrefix(hostname, &quot;www&quot;) {
  9. hostname = strings.TrimPrefix(hostname, &quot;www.&quot;)
  10. }
  11. if strings.Contains(hostname, &quot;/&quot;) {
  12. temp = strings.Split(hostname, &quot;/&quot;)
  13. fmt.Println(temp[0])
  14. } else {
  15. fmt.Println(hostname)
  16. }
  17. }
  18. }


  1. google.co.in
  2. google.in
  3. instagram.com
  4. nymag.com
  5. example.com
  6. example.com
  7. example.org
  8. google.co.in
  9. google.com
  10. bbb.org
  11. localvisibilitysystem.com
  12. example.com
  13. example.com
  14. google.com
  15. example.com
  16. example.com
  17. example.com
  18. example.com
  19. example.com
  20. livejournal.com
  21. delicious.com
  22. illinois.edu
  23. instagram.com
  24. nymag.com
  25. altervista.org
  26. t.co
  27. reddit.com
  28. tinyurl.com

this will give you required Domain from any url
Go Playground link for the above:

  • 本文由 发表于 2021年5月22日 22:35:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/67650694.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
