How can I translate this IDNA URL to Unicode?

huangapple go评论103阅读模式
英文:

How can I translate this IDNA URL to Unicode?

问题

我想将一个IDNA ASCII URL转换为Unicode。

package main

import (
	"golang.org/x/net/idna"
	"log"
)

func main() {
	input := "https://xn---36-mddtcafmzdgfgpbxs0h7c.xn--p1ai"
	idnaProfile := idna.New()
	output, err := idnaProfile.ToUnicode(input)
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("%s", output)
}

输出结果是:https://xn---36-mddtcafmzdgfgpbxs0h7c.рф

看起来IDNA包只转换了顶级域名。是否有一些选项可以转换整个URL?

我需要获得与将ASCII URL粘贴到Chrome中时相同的结果:
https://природный-источник36.рф

英文:

I want to translate an IDNA ASCII URL to Unicode.

package main

import (
	"golang.org/x/net/idna"
	"log"
)

func main() {
	input := "https://xn---36-mddtcafmzdgfgpbxs0h7c.xn--p1ai"
	idnaProfile := idna.New()
	output, err := idnaProfile.ToUnicode(input)
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("%s", output)
}

The output is: https://xn---36-mddtcafmzdgfgpbxs0h7c.рф

It seems the IDNA package only converts the TLD. Is there some option that can convert the full URL?

I need to get the same result as when I paste the ASCII URL into Chrome:
https://природный-источник36.рф

答案1

得分: 1

你只需要首先解析URL:

package main

import (
   "golang.org/x/net/idna"
   "net/url"
)

func main() {
   p, e := url.Parse("https://xn---36-mddtcafmzdgfgpbxs0h7c.xn--p1ai")
   if e != nil {
      panic(e)
   }
   s, e := idna.ToUnicode(p.Host)
   if e != nil {
      panic(e)
   }
   println(s == "природный-источник36.рф")
}

https://golang.org/pkg/net/url#Parse

英文:

You simply need to parse the URL first:

package main

import (
   "golang.org/x/net/idna"
   "net/url"
)

func main() {
   p, e := url.Parse("https://xn---36-mddtcafmzdgfgpbxs0h7c.xn--p1ai")
   if e != nil {
      panic(e)
   }
   s, e := idna.ToUnicode(p.Host)
   if e != nil {
      panic(e)
   }
   println(s == "природный-источник36.рф")
}

https://golang.org/pkg/net/url#Parse

答案2

得分: -1

一个IDNA字符串由用点号“.”分隔的“标签”组成。每个标签可以被编码(如果以“xn--”开头)或者不被编码(如果不是以“xn--”开头)。你的字符串由两个标签组成,https://xn---36-mddtcafmzdgfgpbxs0h7cxn--p1ai。只有第二个标签是IDNA编码的。

只处理那些被IDNA编码的URL部分(即主机名)。其他任何内容都是无意义的,无法工作。

英文:

An IDNA string consists of "labels" separated by dots ".". Each label may be encoded (if it starts with "xn--") or not (if it doesn't). Your string consists of two labels, https://xn---36-mddtcafmzdgfgpbxs0h7c and xn--p1ai. Only the second one is IDNA encoded.

Just process those parts of the URL which are IDNA encoded (i.e. the hostname). Anything else is just nonsensical and cannot work.

huangapple
  • 本文由 发表于 2021年5月26日 17:11:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/67701899.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定