2021年5月22日 22:35:45go评论117阅读模式

英文:

How do I retrieve the domain from a URL?

问题

在Go语言中，你可以使用正则表达式来从URL字符串中提取域名。以下是一个示例代码：

package main

import (
	"fmt"
	"regexp"
)

func extractDomain(url string) (string, error) {
	// 定义正则表达式
	regex := regexp.MustCompile(`^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)`)

	// 使用正则表达式提取域名
	matches := regex.FindStringSubmatch(url)
	if len(matches) < 2 {
		return "", fmt.Errorf("Failed to extract domain from URL")
	}

	return matches[1], nil
}

func main() {
	urls := []string{
		"https://www.example.com/some-random-url",
		"www.example.com/some-random-url",
		"example.com/some-random-url",
		"www.example.com",
		"subdomain.example.com",
	}

	for _, url := range urls {
		domain, err := extractDomain(url)
		if err != nil {
			fmt.Printf("Error extracting domain from URL: %v\n", err)
			continue
		}
		fmt.Println(domain)
	}
}

这段代码使用了正则表达式来匹配URL字符串中的域名部分。它首先定义了一个正则表达式模式，然后使用FindStringSubmatch函数来提取匹配的域名部分。最后，通过循环遍历URL列表，调用extractDomain函数来提取域名并打印输出。

请注意，这段代码使用了Go语言的标准库中的regexp包来处理正则表达式。

英文:

In Go, how can I extract only the domain name from a URL string?

Before:

https://www.example.com/some-random-url
www.example.com/some-random-url
example.com/some-random-url
www.example.com
subdomain.example.com

After:

example.com

Also, I'm limited to using the Golang standard library.

答案1

得分: 1

我终于弄清楚了。

package main

import (
	"fmt"
	"log"
	"net/url"
	"strings"
)

func main() {
	url, err := url.Parse("https://www.example.com")
	if err != nil {
		log.Fatal(err)
	}
	parts := strings.Split(url.Hostname(), ".")
	domain := parts[len(parts)-2] + "." + parts[len(parts)-1]
	fmt.Println(domain)
}

example.com

如果域名是像 subdomain.example.com 这样的，它会导致程序崩溃。

https://play.golang.org/p/Li0PviAr2jU

英文:

I've finally figured it out.

package main

import (
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;net/url&quot;
	&quot;strings&quot;
)

func main() {
	url, err := url.Parse(&quot;https://www.example.com&quot;)
	if err != nil {
		log.Fatal(err)
	}
	parts := strings.Split(url.Hostname(), &quot;.&quot;)
	domain := parts[len(parts)-2] + &quot;.&quot; + parts[len(parts)-1]
	fmt.Println(domain)
}

example.com

If the domain is something like subdomain.example.com than it will give you a panic.

https://play.golang.org/p/Li0PviAr2jU

答案2

得分: 1

我认为，由于你的示例中也有错误的URL，所以你需要使用正则表达式来提取URL中的域名。请参考下面的示例代码，以获取你分享的示例的域名：

package main

import (
	"fmt"
	"regexp"
)

// 主函数
func main() {

	// 从给定字符串中查找正则表达式
	// 使用FindString()方法
	m := regexp.MustCompile(`\.?([^.]*.com)`)

	fmt.Println(m.FindStringSubmatch("https://www.example.com/some-random-url")[1])
	fmt.Println(m.FindStringSubmatch("www.example.com/some-random-url")[1])
	fmt.Println(m.FindStringSubmatch("example.com/some-random-url")[1])
	fmt.Println(m.FindStringSubmatch("www.example.com")[1])
	fmt.Println(m.FindStringSubmatch("subdomain.example.com")[1])

}

理想情况下，这涵盖了所有情况（包括格式不正确的URL）。如果有任何无法正确解析的URL，你可以轻松更新正则表达式。

以上是上述代码的Go Playground链接：这里。

英文:

I think since your examples has incorrect URLs as well, you need to use Regular Expresssion to extract the domain in the URL. Please find the sample code below to get the domain for the examples you shared:

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
)

// Main function
func main() {

	// Finding regexp from the given string
	// Using FindString() method
	m := regexp.MustCompile(`\.?([^.]*.com)`)

	fmt.Println(m.FindStringSubmatch(&quot;https://www.example.com/some-random-url&quot;)[1])
	fmt.Println(m.FindStringSubmatch(&quot;www.example.com/some-random-url&quot;)[1])
	fmt.Println(m.FindStringSubmatch(&quot;example.com/some-random-url&quot;)[1])
	fmt.Println(m.FindStringSubmatch(&quot;www.example.com&quot;)[1])
	fmt.Println(m.FindStringSubmatch(&quot;subdomain.example.com&quot;)[1])

}

Ideally, this covers all the cases (including incorrectly formed URLs). You can easily update RegEx if there is any URL that doesn't get parsed correctly.

Go Playground link for the above: here.

答案3

得分: 0

这个解决方案将会把以下内容转换为：

"   ",
"aaa",
"not domain",
"ca.mail.google.com",
"google.com",
" google.com ",
" www.google.com/a/example.com",
"www.google.com/f/example.com",
"google.com/f/example.com",
"http://google.com/f/abc.com",
"http://google.com/f/?wow=xyz.com",
"http://google.com/f/?wow=www.xyz.com",
"http://www.google.com/f/abc.com",
"https://www.google.com/f/abc.com",
"https://mail.google.com/f/abc.com",
"https://123.google.com/f/abc.com",
"https://xn-ddf3.google.com/f/abc.com",

转换为：

[空字符串]
[空字符串]
[空字符串]
ca.mail.google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
mail.google.com
123.google.com
xn-ddf3.google.com

"net/url" 方法 url.Parse 无法处理类似于 bla bla google.com 的域名字符串。

英文:

This solution

func extractDomain(urlLikeString string) string {

	urlLikeString = strings.TrimSpace(urlLikeString)

	if regexp.MustCompile(`^https?`).MatchString(urlLikeString) {
		read, _ := url.Parse(urlLikeString)
		urlLikeString = read.Host
	}

	if regexp.MustCompile(`^www\.`).MatchString(urlLikeString) {
		urlLikeString = regexp.MustCompile(`^www\.`).ReplaceAllString(urlLikeString, &quot;&quot;)
	}

	return regexp.MustCompile(`([a-z0-9\-]+\.)+[a-z0-9\-]+`).FindString(urlLikeString)
}

will turn this

&quot;   &quot;,
&quot;aaa&quot;,
&quot;not domain&quot;,
&quot;ca.mail.google.com&quot;,
&quot;google.com&quot;,
&quot; google.com &quot;,
&quot; www.google.com/a/example.com&quot;,
&quot;www.google.com/f/example.com&quot;,
&quot;google.com/f/example.com&quot;,
&quot;http://google.com/f/abc.com&quot;,
&quot;http://google.com/f/?wow=xyz.com&quot;,
&quot;http://google.com/f/?wow=www.xyz.com&quot;,
&quot;http://www.google.com/f/abc.com&quot;,
&quot;https://www.google.com/f/abc.com&quot;,
&quot;https://mail.google.com/f/abc.com&quot;,
&quot;https://123.google.com/f/abc.com&quot;,
&quot;https://xn-ddf3.google.com/f/abc.com&quot;,

into this

[empty string]
[empty string]
[empty string]
ca.mail.google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
mail.google.com
123.google.com
xn-ddf3.google.com

"net/url" method url.Parse won't handle domain-like string, like: bla bla google.com.

答案4

得分: -1

我认为这可能会有帮助。

package main

import (
	"fmt"
	"log"
	"net/url"
	"strings"
)

func main() {
	strArray := []string{
		"www.google.co.in",
		"https://google.in",
		"instagram.com",
		"nymag.com",
		"http://www.example.com/?airport=approval&amp;box=brother",
		"https://www.example.com/babies.php#birds",
		"http://example.org/bear",
		"www.google.co.in",
		"google.com",
		"https://www.bbb.org/search/business-review-form/",
		"https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/",
		"http://www.example.com/boat/advertisement?actor=bat#boundary",
		"https://www.example.com/",
		"https://www.google.com",
		"https://www.example.com/army/approval.htm?basket=bottle",
		"http://example.com/board.aspx?afternoon=appliance&amp;angle=ball",
		"http://www.example.com/",
		"http://example.com/",
		"http://www.example.com/",
		"livejournal.com",
		"delicious.com",
		"illinois.edu",
		"instagram.com",
		"nymag.com",
		"altervista.org",
		"t.co",
		"reddit.com",
		"tinyurl.com",
	}
	var hostname string
	var temp []string
	for i := 0; i < len(strArray); i++ {
		url, err := url.Parse(strArray[i])
		if err != nil {
			log.Fatal(err)
		}
		var urlstr string = url.String()

		// 这里将过滤前缀和主机名
		if strings.HasPrefix(urlstr, "https") {
			hostname = strings.TrimPrefix(urlstr, "https://")
		} else if strings.HasPrefix(urlstr, "http") {
			hostname = strings.TrimPrefix(urlstr, "http://")
		} else {
			hostname = urlstr
		}

		if strings.HasPrefix(hostname, "www") {
			hostname = strings.TrimPrefix(hostname, "www.")
		}
		if strings.Contains(hostname, "/") {
			temp = strings.Split(hostname, "/")
			fmt.Println(temp[0])
		} else {
			fmt.Println(hostname)
		}

	}
}

输出：

google.co.in
google.in
instagram.com
nymag.com
example.com
example.com
example.org
google.co.in
google.com
bbb.org
localvisibilitysystem.com
example.com
example.com
google.com
example.com
example.com
example.com
example.com
example.com
livejournal.com
delicious.com
illinois.edu
instagram.com
nymag.com
altervista.org
t.co
reddit.com
tinyurl.com
这将从任何URL中提取所需的域名。
以上是Go Playground的链接：
https://go.dev/play/p/vfCOAnTNqh8

英文:

I think this can be helpful

package main
import (
&quot;fmt&quot;
&quot;log&quot;
&quot;net/url&quot;
&quot;strings&quot;
)
func main() {
strArray := []string{
&quot;www.google.co.in&quot;,
&quot;https://google.in&quot;,
&quot;instagram.com&quot;,
&quot;nymag.com&quot;,
&quot;http://www.example.com/?airport=approval&amp;box=brother&quot;,
&quot;https://www.example.com/babies.php#birds&quot;,
&quot;http://example.org/bear&quot;,
&quot;www.google.co.in&quot;,
&quot;google.com&quot;,
&quot;https://www.bbb.org/search/business-review-form/&quot;,
&quot;https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/&quot;,
&quot;http://www.example.com/boat/advertisement?actor=bat#boundary&quot;,
&quot;https://www.example.com/&quot;,
&quot;https://www.google.com&quot;,
&quot;https://www.example.com/army/approval.htm?basket=bottle&quot;,
&quot;http://example.com/board.aspx?afternoon=appliance&amp;angle=ball&quot;,
&quot;http://www.example.com/&quot;,
&quot;http://example.com/&quot;,
&quot;http://www.example.com/&quot;,
&quot;livejournal.com&quot;,
&quot;delicious.com&quot;,
&quot;illinois.edu&quot;,
&quot;instagram.com&quot;,
&quot;nymag.com&quot;,
&quot;altervista.org&quot;,
&quot;t.co&quot;,
&quot;reddit.com&quot;,
&quot;tinyurl.com&quot;,
}
var hostname string
var temp []string
for i := 0; i &lt; len(strArray); i++ {
url, err := url.Parse(strArray[i])
if err != nil {
log.Fatal(err)
}
var urlstr string = url.String()

here prefix and host name will be filtered

	if strings.HasPrefix(urlstr, &quot;https&quot;) {
hostname = strings.TrimPrefix(urlstr, &quot;https://&quot;)
} else if strings.HasPrefix(urlstr, &quot;http&quot;) {
hostname = strings.TrimPrefix(urlstr, &quot;http://&quot;)
} else {
hostname = urlstr
}
if strings.HasPrefix(hostname, &quot;www&quot;) {
hostname = strings.TrimPrefix(hostname, &quot;www.&quot;)
}
if strings.Contains(hostname, &quot;/&quot;) {
temp = strings.Split(hostname, &quot;/&quot;)
fmt.Println(temp[0])
} else {
fmt.Println(hostname)
}
}
}

output:

 google.co.in
google.in
instagram.com
nymag.com
example.com
example.com
example.org
google.co.in
google.com
bbb.org
localvisibilitysystem.com
example.com
example.com
google.com
example.com
example.com
example.com
example.com
example.com
livejournal.com
delicious.com
illinois.edu
instagram.com
nymag.com
altervista.org
t.co
reddit.com
tinyurl.com

this will give you required Domain from any url
Go Playground link for the above:
https://go.dev/play/p/vfCOAnTNqh8

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从URL中提取域名？

问题

答案1

答案2

答案3

答案4

在Go语言中保留YAML文件的顺序

如何在一组固定长度的字节数组中实现高效的前缀搜索？

Change linux namespace in go

Golang中与Java的synchronized()块等效的是什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论