英文:
How do I retrieve the domain from a URL?
问题
在Go语言中,你可以使用正则表达式来从URL字符串中提取域名。以下是一个示例代码:
package main
import (
"fmt"
"regexp"
)
func extractDomain(url string) (string, error) {
// 定义正则表达式
regex := regexp.MustCompile(`^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)`)
// 使用正则表达式提取域名
matches := regex.FindStringSubmatch(url)
if len(matches) < 2 {
return "", fmt.Errorf("Failed to extract domain from URL")
}
return matches[1], nil
}
func main() {
urls := []string{
"https://www.example.com/some-random-url",
"www.example.com/some-random-url",
"example.com/some-random-url",
"www.example.com",
"subdomain.example.com",
}
for _, url := range urls {
domain, err := extractDomain(url)
if err != nil {
fmt.Printf("Error extracting domain from URL: %v\n", err)
continue
}
fmt.Println(domain)
}
}
这段代码使用了正则表达式来匹配URL字符串中的域名部分。它首先定义了一个正则表达式模式,然后使用FindStringSubmatch
函数来提取匹配的域名部分。最后,通过循环遍历URL列表,调用extractDomain
函数来提取域名并打印输出。
请注意,这段代码使用了Go语言的标准库中的regexp
包来处理正则表达式。
英文:
In Go, how can I extract only the domain name from a URL string?
Before:
https://www.example.com/some-random-url
www.example.com/some-random-url
example.com/some-random-url
www.example.com
subdomain.example.com
After:
example.com
Also, I'm limited to using the Golang standard library.
答案1
得分: 1
我终于弄清楚了。
package main
import (
"fmt"
"log"
"net/url"
"strings"
)
func main() {
url, err := url.Parse("https://www.example.com")
if err != nil {
log.Fatal(err)
}
parts := strings.Split(url.Hostname(), ".")
domain := parts[len(parts)-2] + "." + parts[len(parts)-1]
fmt.Println(domain)
}
example.com
如果域名是像 subdomain.example.com
这样的,它会导致程序崩溃。
https://play.golang.org/p/Li0PviAr2jU
英文:
I've finally figured it out.
package main
import (
"fmt"
"log"
"net/url"
"strings"
)
func main() {
url, err := url.Parse("https://www.example.com")
if err != nil {
log.Fatal(err)
}
parts := strings.Split(url.Hostname(), ".")
domain := parts[len(parts)-2] + "." + parts[len(parts)-1]
fmt.Println(domain)
}
example.com
If the domain is something like subdomain.example.com
than it will give you a panic.
答案2
得分: 1
我认为,由于你的示例中也有错误的URL,所以你需要使用正则表达式来提取URL中的域名。请参考下面的示例代码,以获取你分享的示例的域名:
package main
import (
"fmt"
"regexp"
)
// 主函数
func main() {
// 从给定字符串中查找正则表达式
// 使用FindString()方法
m := regexp.MustCompile(`\.?([^.]*.com)`)
fmt.Println(m.FindStringSubmatch("https://www.example.com/some-random-url")[1])
fmt.Println(m.FindStringSubmatch("www.example.com/some-random-url")[1])
fmt.Println(m.FindStringSubmatch("example.com/some-random-url")[1])
fmt.Println(m.FindStringSubmatch("www.example.com")[1])
fmt.Println(m.FindStringSubmatch("subdomain.example.com")[1])
}
理想情况下,这涵盖了所有情况(包括格式不正确的URL)。如果有任何无法正确解析的URL,你可以轻松更新正则表达式。
以上是上述代码的Go Playground链接:这里。
英文:
I think since your examples has incorrect URLs as well, you need to use Regular Expresssion to extract the domain in the URL. Please find the sample code below to get the domain for the examples you shared:
package main
import (
"fmt"
"regexp"
)
// Main function
func main() {
// Finding regexp from the given string
// Using FindString() method
m := regexp.MustCompile(`\.?([^.]*.com)`)
fmt.Println(m.FindStringSubmatch("https://www.example.com/some-random-url")[1])
fmt.Println(m.FindStringSubmatch("www.example.com/some-random-url")[1])
fmt.Println(m.FindStringSubmatch("example.com/some-random-url")[1])
fmt.Println(m.FindStringSubmatch("www.example.com")[1])
fmt.Println(m.FindStringSubmatch("subdomain.example.com")[1])
}
Ideally, this covers all the cases (including incorrectly formed URLs). You can easily update RegEx if there is any URL that doesn't get parsed correctly.
Go Playground link for the above: here.
答案3
得分: 0
这个解决方案将会把以下内容转换为:
" ",
"aaa",
"not domain",
"ca.mail.google.com",
"google.com",
" google.com ",
" www.google.com/a/example.com",
"www.google.com/f/example.com",
"google.com/f/example.com",
"http://google.com/f/abc.com",
"http://google.com/f/?wow=xyz.com",
"http://google.com/f/?wow=www.xyz.com",
"http://www.google.com/f/abc.com",
"https://www.google.com/f/abc.com",
"https://mail.google.com/f/abc.com",
"https://123.google.com/f/abc.com",
"https://xn-ddf3.google.com/f/abc.com",
转换为:
[空字符串]
[空字符串]
[空字符串]
ca.mail.google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
mail.google.com
123.google.com
xn-ddf3.google.com
"net/url" 方法 url.Parse
无法处理类似于 bla bla google.com
的域名字符串。
英文:
This solution
func extractDomain(urlLikeString string) string {
urlLikeString = strings.TrimSpace(urlLikeString)
if regexp.MustCompile(`^https?`).MatchString(urlLikeString) {
read, _ := url.Parse(urlLikeString)
urlLikeString = read.Host
}
if regexp.MustCompile(`^www\.`).MatchString(urlLikeString) {
urlLikeString = regexp.MustCompile(`^www\.`).ReplaceAllString(urlLikeString, "")
}
return regexp.MustCompile(`([a-z0-9\-]+\.)+[a-z0-9\-]+`).FindString(urlLikeString)
}
will turn this
" ",
"aaa",
"not domain",
"ca.mail.google.com",
"google.com",
" google.com ",
" www.google.com/a/example.com",
"www.google.com/f/example.com",
"google.com/f/example.com",
"http://google.com/f/abc.com",
"http://google.com/f/?wow=xyz.com",
"http://google.com/f/?wow=www.xyz.com",
"http://www.google.com/f/abc.com",
"https://www.google.com/f/abc.com",
"https://mail.google.com/f/abc.com",
"https://123.google.com/f/abc.com",
"https://xn-ddf3.google.com/f/abc.com",
into this
[empty string]
[empty string]
[empty string]
ca.mail.google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
google.com
mail.google.com
123.google.com
xn-ddf3.google.com
"net/url" method url.Parse
won't handle domain-like string, like: bla bla google.com
.
答案4
得分: -1
我认为这可能会有帮助。
package main
import (
"fmt"
"log"
"net/url"
"strings"
)
func main() {
strArray := []string{
"www.google.co.in",
"https://google.in",
"instagram.com",
"nymag.com",
"http://www.example.com/?airport=approval&box=brother",
"https://www.example.com/babies.php#birds",
"http://example.org/bear",
"www.google.co.in",
"google.com",
"https://www.bbb.org/search/business-review-form/",
"https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/",
"http://www.example.com/boat/advertisement?actor=bat#boundary",
"https://www.example.com/",
"https://www.google.com",
"https://www.example.com/army/approval.htm?basket=bottle",
"http://example.com/board.aspx?afternoon=appliance&angle=ball",
"http://www.example.com/",
"http://example.com/",
"http://www.example.com/",
"livejournal.com",
"delicious.com",
"illinois.edu",
"instagram.com",
"nymag.com",
"altervista.org",
"t.co",
"reddit.com",
"tinyurl.com",
}
var hostname string
var temp []string
for i := 0; i < len(strArray); i++ {
url, err := url.Parse(strArray[i])
if err != nil {
log.Fatal(err)
}
var urlstr string = url.String()
// 这里将过滤前缀和主机名
if strings.HasPrefix(urlstr, "https") {
hostname = strings.TrimPrefix(urlstr, "https://")
} else if strings.HasPrefix(urlstr, "http") {
hostname = strings.TrimPrefix(urlstr, "http://")
} else {
hostname = urlstr
}
if strings.HasPrefix(hostname, "www") {
hostname = strings.TrimPrefix(hostname, "www.")
}
if strings.Contains(hostname, "/") {
temp = strings.Split(hostname, "/")
fmt.Println(temp[0])
} else {
fmt.Println(hostname)
}
}
}
输出:
google.co.in
google.in
instagram.com
nymag.com
example.com
example.com
example.org
google.co.in
google.com
bbb.org
localvisibilitysystem.com
example.com
example.com
google.com
example.com
example.com
example.com
example.com
example.com
livejournal.com
delicious.com
illinois.edu
instagram.com
nymag.com
altervista.org
t.co
reddit.com
tinyurl.com
这将从任何URL中提取所需的域名。
以上是Go Playground的链接:
https://go.dev/play/p/vfCOAnTNqh8
英文:
I think this can be helpful
package main
import (
"fmt"
"log"
"net/url"
"strings"
)
func main() {
strArray := []string{
"www.google.co.in",
"https://google.in",
"instagram.com",
"nymag.com",
"http://www.example.com/?airport=approval&box=brother",
"https://www.example.com/babies.php#birds",
"http://example.org/bear",
"www.google.co.in",
"google.com",
"https://www.bbb.org/search/business-review-form/",
"https://www.localvisibilitysystem.com/2015/08/19/how-to-use-meetup-sponsorships-for-local-marketing-and-seo-dave-oremlands-tips/",
"http://www.example.com/boat/advertisement?actor=bat#boundary",
"https://www.example.com/",
"https://www.google.com",
"https://www.example.com/army/approval.htm?basket=bottle",
"http://example.com/board.aspx?afternoon=appliance&angle=ball",
"http://www.example.com/",
"http://example.com/",
"http://www.example.com/",
"livejournal.com",
"delicious.com",
"illinois.edu",
"instagram.com",
"nymag.com",
"altervista.org",
"t.co",
"reddit.com",
"tinyurl.com",
}
var hostname string
var temp []string
for i := 0; i < len(strArray); i++ {
url, err := url.Parse(strArray[i])
if err != nil {
log.Fatal(err)
}
var urlstr string = url.String()
here prefix and host name will be filtered
if strings.HasPrefix(urlstr, "https") {
hostname = strings.TrimPrefix(urlstr, "https://")
} else if strings.HasPrefix(urlstr, "http") {
hostname = strings.TrimPrefix(urlstr, "http://")
} else {
hostname = urlstr
}
if strings.HasPrefix(hostname, "www") {
hostname = strings.TrimPrefix(hostname, "www.")
}
if strings.Contains(hostname, "/") {
temp = strings.Split(hostname, "/")
fmt.Println(temp[0])
} else {
fmt.Println(hostname)
}
}
}
output:
google.co.in
google.in
instagram.com
nymag.com
example.com
example.com
example.org
google.co.in
google.com
bbb.org
localvisibilitysystem.com
example.com
example.com
google.com
example.com
example.com
example.com
example.com
example.com
livejournal.com
delicious.com
illinois.edu
instagram.com
nymag.com
altervista.org
t.co
reddit.com
tinyurl.com
this will give you required Domain from any url
Go Playground link for the above:
https://go.dev/play/p/vfCOAnTNqh8
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论