如何从URL中提取域名?

huangapple go评论117阅读模式
英文:

How do I retrieve the domain from a URL?

问题

在Go语言中,你可以使用正则表达式来从URL字符串中提取域名。以下是一个示例代码:

package main

import (
	"fmt"
	"regexp"
)

func extractDomain(url string) (string, error) {
	// 定义正则表达式
	regex := regexp.MustCompile(`^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)`)

	// 使用正则表达式提取域名
	matches := regex.FindStringSubmatch(url)
	if len(matches) < 2 {
		return "", fmt.Errorf("Failed to extract domain from URL")
	}

	return matches[1], nil
}

func main() {
	urls := []string{
		"https://www.example.com/some-random-url",
		"www.example.com/some-random-url",
		"example.com/some-random-url",
		"www.example.com",
		"subdomain.example.com",
	}

	for _, url := range urls {
		domain, err := extractDomain(url)
		if err != nil {
			fmt.Printf("Error extracting domain from URL: %v\n", err)
			continue
		}
		fmt.Println(domain)
	}
}

这段代码使用了正则表达式来匹配URL字符串中的域名部分。它首先定义了一个正则表达式模式,然后使用FindStringSubmatch函数来提取匹配的域名部分。最后,通过循环遍历URL列表,调用extractDomain函数来提取域名并打印输出。

请注意,这段代码使用了Go语言的标准库中的regexp包来处理正则表达式。

英文:

In Go, how can I extract only the domain name from a URL string?

Before:

https://www.example.com/some-random-url
www.example.com/some-random-url
example.com/some-random-url
www.example.com
subdomain.example.com

After:

example.com

Also, I'm limited to using the Golang standard library.

答案1

得分: 1

我终于弄清楚了。

package main

import (
	"fmt"
	"log"
	"net/url"
	"strings"
)

func main() {
	url, err := url.Parse("https://www.example.com")
	if err != nil {
		log.Fatal(err)
	}
	parts := strings.Split(url.Hostname(), ".")
	domain := parts[len(parts)-2] + "." + parts[len(parts)-1]
	fmt.Println(domain)
}

example.com

如果域名是像 subdomain.example.com 这样的,它会导致程序崩溃。

https://play.golang.org/p/Li0PviAr2jU

英文:

I've finally figured it out.

package main

import (
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;net/url&quot;
	&quot;strings&quot;
)

func main() {
	url, err := url.Parse(&quot;https://www.example.com&quot;)
	if err != nil {
		log.Fatal(err)
	}
	parts := strings.Split(url.Hostname(), &quot;.&quot;)
	domain := parts[len(parts)-2] + &quot;.&quot; + parts[len(parts)-1]
	fmt.Println(domain)
}

example.com

If the domain is something like subdomain.example.com than it will give you a panic.

https://play.golang.org/p/Li0PviAr2jU

答案2

得分: 1

我认为,由于你的示例中也有错误的URL,所以你需要使用正则表达式来提取URL中的域名。请参考下面的示例代码,以获取你分享的示例的域名:

package main

import (
	"fmt"
	"regexp"
)

// 主函数
func main() {

	// 从给定字符串中查找正则表达式
	// 使用FindString()方法
	m := regexp.MustCompile(`\.?([^.]*.com)`)

	fmt.Println(m.FindStringSubmatch("https://www.example.com/some-random-url")[1])
	fmt.Println(m.FindStringSubmatch("www.example.com/some-random-url")[1])
	fmt.Println(m.FindStringSubmatch("example.com/some-random-url")[1])
	fmt.Println(m.FindStringSubmatch("www.example.com")[1])
	fmt.Println(m.FindStringSubmatch("subdomain.example.com")[1])

}

理想情况下,这涵盖了所有情况(包括格式不正确的URL)。如果有任何无法正确解析的URL,你可以轻松更新正则表达式。

以上是上述代码的Go Playground链接:这里

英文:

I think since your examples has incorrect URLs as well, you need to use Regular Expresssion to extract the domain in the URL. Please find the sample code below to get the domain for the examples you shared:

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
)

// Main function
func main() {

	// Finding regexp from the given string
	// Using FindString() method
	m := regexp.MustCompile(`\.?([^.]*.com)`)

	fmt.Println(m.FindStringSubmatch(&quot;https://www.example.com/some-random-url&quot;)[1])
	fmt.Println(m.FindStringSubmatch(&quot;www.example.com/some-random-url&quot;)[1])
	fmt.Println(m.FindStringSubmatch(&quot;example.com/some-random-url&quot;)[1])
	fmt.Println(m.FindStringSubmatch(&quot;www.example.com&quot;)[1])
	fmt.Println(m.FindStringSubmatch(&quot;subdomain.example.com&quot;)[1])

}

Ideally, this covers all the cases (including incorrectly formed URLs). You can easily update RegEx if there is any URL that doesn't get parsed correctly.

Go Playground link for the above: here.

答案3

得分: 0

这个解决方案将会把以下内容转换为:

"   ",
"aaa",
"not domain",
"ca.mail.google.com",
"google.com",
" google.com ",
" www.google.com/a/example.com",
"www.google.com/f/example.com",
"google.com/f/example.com",
"http://google.com/f/abc.com",
"http://google.com/f/?wow=xyz.com",
"http://google.com/f/?wow=www.xyz.com",
"http://www.google.com/f/abc.com",
"https://www.google.com/f/abc.com",
"https://mail.google.com/f/abc.com",
"https://123.google.com/f/abc.com",
"https://xn-ddf3.google.com/f/abc.com",

转换为:

[空字符串]
[空字符串]
[空字符串]
ca.mail.google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
mail.google.com
123.google.com
xn-ddf3.google.com

"net/url" 方法 url.Parse 无法处理类似于 bla bla google.com 的域名字符串。

英文:

This solution

func extractDomain(urlLikeString string) string {

	urlLikeString = strings.TrimSpace(urlLikeString)

	if regexp.MustCompile(`^https?`).MatchString(urlLikeString) {
		read, _ := url.Parse(urlLikeString)
		urlLikeString = read.Host
	}

	if regexp.MustCompile(`^www\.`).MatchString(urlLikeString) {
		urlLikeString = regexp.MustCompile(`^www\.`).ReplaceAllString(urlLikeString, &quot;&quot;)
	}

	return regexp.MustCompile(`([a-z0-9\-]+\.)+[a-z0-9\-]+`).FindString(urlLikeString)
}

will turn this

&quot;   &quot;,
&quot;aaa&quot;,
&quot;not domain&quot;,
&quot;ca.mail.google.com&quot;,
&quot;google.com&quot;,
&quot; google.com &quot;,
&quot; www.google.com/a/example.com&quot;,
&quot;www.google.com/f/example.com&quot;,
&quot;google.com/f/example.com&quot;,
&quot;http://google.com/f/abc.com&quot;,
&quot;http://google.com/f/?wow=xyz.com&quot;,
&quot;http://google.com/f/?wow=www.xyz.com&quot;,
&quot;http://www.google.com/f/abc.com&quot;,
&quot;https://www.google.com/f/abc.com&quot;,
&quot;https://mail.google.com/f/abc.com&quot;,
&quot;https://123.google.com/f/abc.com&quot;,
&quot;https://xn-ddf3.google.com/f/abc.com&quot;,

into this

[empty string]
[empty string]
[empty string]
ca.mail.google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
mail.google.com
123.google.com
xn-ddf3.google.com

"net/url" method url.Parse won't handle domain-like string, like: bla bla google.com.

答案4

得分: -1

我认为这可能会有帮助。

package main

import (
	"fmt"
	"log"
	"net/url"
	"strings"
)

func main() {
	strArray := []string{
		"www.google.co.in",
		"https://google.in",
		"instagram.com",
		"nymag.com",
		"http://www.example.com/?airport=approval&amp;box=brother",
		"https://www.example.com/babies.php#birds",
		"http://example.org/bear",
		"www.google.co.in",
		"google.com",
		"https://www.bbb.org/search/business-review-form/",
		"https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/",
		"http://www.example.com/boat/advertisement?actor=bat#boundary",
		"https://www.example.com/",
		"https://www.google.com",
		"https://www.example.com/army/approval.htm?basket=bottle",
		"http://example.com/board.aspx?afternoon=appliance&amp;angle=ball",
		"http://www.example.com/",
		"http://example.com/",
		"http://www.example.com/",
		"livejournal.com",
		"delicious.com",
		"illinois.edu",
		"instagram.com",
		"nymag.com",
		"altervista.org",
		"t.co",
		"reddit.com",
		"tinyurl.com",
	}
	var hostname string
	var temp []string
	for i := 0; i < len(strArray); i++ {
		url, err := url.Parse(strArray[i])
		if err != nil {
			log.Fatal(err)
		}
		var urlstr string = url.String()

		// 这里将过滤前缀和主机名
		if strings.HasPrefix(urlstr, "https") {
			hostname = strings.TrimPrefix(urlstr, "https://")
		} else if strings.HasPrefix(urlstr, "http") {
			hostname = strings.TrimPrefix(urlstr, "http://")
		} else {
			hostname = urlstr
		}

		if strings.HasPrefix(hostname, "www") {
			hostname = strings.TrimPrefix(hostname, "www.")
		}
		if strings.Contains(hostname, "/") {
			temp = strings.Split(hostname, "/")
			fmt.Println(temp[0])
		} else {
			fmt.Println(hostname)
		}

	}
}

输出:

google.co.in
google.in
instagram.com
nymag.com
example.com
example.com
example.org
google.co.in
google.com
bbb.org
localvisibilitysystem.com
example.com
example.com
google.com
example.com
example.com
example.com
example.com
example.com
livejournal.com
delicious.com
illinois.edu
instagram.com
nymag.com
altervista.org
t.co
reddit.com
tinyurl.com
这将从任何URL中提取所需的域名。
以上是Go Playground的链接:
https://go.dev/play/p/vfCOAnTNqh8
英文:

I think this can be helpful

package main
import (
&quot;fmt&quot;
&quot;log&quot;
&quot;net/url&quot;
&quot;strings&quot;
)
func main() {
strArray := []string{
&quot;www.google.co.in&quot;,
&quot;https://google.in&quot;,
&quot;instagram.com&quot;,
&quot;nymag.com&quot;,
&quot;http://www.example.com/?airport=approval&amp;box=brother&quot;,
&quot;https://www.example.com/babies.php#birds&quot;,
&quot;http://example.org/bear&quot;,
&quot;www.google.co.in&quot;,
&quot;google.com&quot;,
&quot;https://www.bbb.org/search/business-review-form/&quot;,
&quot;https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/&quot;,
&quot;http://www.example.com/boat/advertisement?actor=bat#boundary&quot;,
&quot;https://www.example.com/&quot;,
&quot;https://www.google.com&quot;,
&quot;https://www.example.com/army/approval.htm?basket=bottle&quot;,
&quot;http://example.com/board.aspx?afternoon=appliance&amp;angle=ball&quot;,
&quot;http://www.example.com/&quot;,
&quot;http://example.com/&quot;,
&quot;http://www.example.com/&quot;,
&quot;livejournal.com&quot;,
&quot;delicious.com&quot;,
&quot;illinois.edu&quot;,
&quot;instagram.com&quot;,
&quot;nymag.com&quot;,
&quot;altervista.org&quot;,
&quot;t.co&quot;,
&quot;reddit.com&quot;,
&quot;tinyurl.com&quot;,
}
var hostname string
var temp []string
for i := 0; i &lt; len(strArray); i++ {
url, err := url.Parse(strArray[i])
if err != nil {
log.Fatal(err)
}
var urlstr string = url.String()

here prefix and host name will be filtered

	if strings.HasPrefix(urlstr, &quot;https&quot;) {
hostname = strings.TrimPrefix(urlstr, &quot;https://&quot;)
} else if strings.HasPrefix(urlstr, &quot;http&quot;) {
hostname = strings.TrimPrefix(urlstr, &quot;http://&quot;)
} else {
hostname = urlstr
}
if strings.HasPrefix(hostname, &quot;www&quot;) {
hostname = strings.TrimPrefix(hostname, &quot;www.&quot;)
}
if strings.Contains(hostname, &quot;/&quot;) {
temp = strings.Split(hostname, &quot;/&quot;)
fmt.Println(temp[0])
} else {
fmt.Println(hostname)
}
}
}

output:

 google.co.in
google.in
instagram.com
nymag.com
example.com
example.com
example.org
google.co.in
google.com
bbb.org
localvisibilitysystem.com
example.com
example.com
google.com
example.com
example.com
example.com
example.com
example.com
livejournal.com
delicious.com
illinois.edu
instagram.com
nymag.com
altervista.org
t.co
reddit.com
tinyurl.com

this will give you required Domain from any url
Go Playground link for the above:
https://go.dev/play/p/vfCOAnTNqh8

huangapple
  • 本文由 发表于 2021年5月22日 22:35:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/67650694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定