英文:
Golang Regex to replace domain to proxy URL
问题
我想将网页中的所有链接替换为反向代理域名。
规则如下:
https://test.com/xxx --> https_test_com.proxy.com/xxx
http://sub.test.com/xxx --> http_sub_test_com.proxy.com/xxx
如何在golang中使用正则表达式实现?
响应体的类型是[]byte
,其字符编码为UTF-8。
我已经尝试了以下方法,但它无法将原始域名中的所有点替换为下划线。子域名的长度是可变的,这意味着点的数量可能会变化。
respBytes := []byte(`_.Xc=function(a){var b=window.google&&window.google.logUrl?"":"https://www.google.com";b+="/gen_204?";b+=a.j(2040-b.length);
<cite class="iUh30 Zu0yb tjvcx">https://cloud.google.com</cite></div><div class="eFM0qc"><a class="fl" href="https://webcache.googleusercontent.com/search?q=cache:80SWJ_cSDhwJ:https://cloud.google.com/+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=au" ping="/url?sa=t&amp;source=web&amp;rct=j&amp;url=https://webcache.googleusercontent.com/search%3Fq%3Dcache:80SWJ_cSDhwJ:https://cloud.google.com/%2B%26cd%3D1%26hl%3Den%26ct%3Dclnk%26gl%3Dau&amp;ved=2ahUKEwia5ovYsv3xAhXS4jgGHad0BJYQIDAAegQIBRAG"><span>Cached</span></a></li><li class="action-menu-item OhScic zsYMMe" role="menuitem"><a class="fl" href="/search?q=related:https://cloud.google.com/+google+cloud&amp;sa=X&amp;ved=2ahUKEwia5ovYsv3xAhXS4jgGHad0BJYQHzAAegQIBRAH">
`)
proxyURI := "proxy.com"
var re = regexp.MustCompile(`(http展开收缩*):\/\/([a-zA-Z0-9_\-.:]*)`)
content := re.ReplaceAll(respBytes, []byte("${1}_${2}."+proxyURI))
原始链接 | 替换结果 | 期望结果 |
---|---|---|
https://www.google.com | https_www.google.com.test.com | https_www_google_com.test.com |
https://cloud.google.com | https_cloud.google.com.test.com | https_cloud_google_com.test.com |
https://https://webcache.googleusercontent.com | https_cloud.google.com.test.com | https_webcache_googleusercontent_com.test.com |
英文:
I want to replace all links of a webpage to a reverse proxy domain.
The rules are
https://test.com/xxx --> https_test_com.proxy.com/xxx
http://sub.test.com/xxx --> http_sub_test_com.proxy.com/xxx
How to achieve it by regex in golang?
The type of response body is []byte
, and character encoding of it is UTF-8.
I have tried in this way. But it cannot replace all the dot to underscore in the origin domain. The length of subdomain is variable, that means the number of dot can vary
respBytes := []byte(`_.Xc=function(a){var b=window.google&&window.google.logUrl?"":"https://www.google.com";b+="/gen_204?";b+=a.j(2040-b.length);
<cite class="iUh30 Zu0yb tjvcx">https://cloud.google.com</cite></div><div class="eFM0qc"><a class="fl" href="https://webcache.googleusercontent.com/search?q=cache:80SWJ_cSDhwJ:https://cloud.google.com/+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=au" ping="/url?sa=t&amp;source=web&amp;rct=j&amp;url=https://webcache.googleusercontent.com/search%3Fq%3Dcache:80SWJ_cSDhwJ:https://cloud.google.com/%2B%26cd%3D1%26hl%3Den%26ct%3Dclnk%26gl%3Dau&amp;ved=2ahUKEwia5ovYsv3xAhXS4jgGHad0BJYQIDAAegQIBRAG"><span>Cached</span></a></li><li class="action-menu-item OhScic zsYMMe" role="menuitem"><a class="fl" href="/search?q=related:https://cloud.google.com/+google+cloud&amp;sa=X&amp;ved=2ahUKEwia5ovYsv3xAhXS4jgGHad0BJYQHzAAegQIBRAH">
`)
proxyURI := "proxy.com"
var re = regexp.MustCompile(`(http展开收缩*):\/\/([a-zA-Z0-9_\-.:]*)`)
content := re.ReplaceAll(respBytes, []byte("_."+proxyURI))
origin | result | expect |
---|---|---|
https://www.google.com | https_www.google.com.test.com | https_www_google_com.test.com |
https://cloud.google.com | https_cloud.google.com.test.com | https_cloud_google_com.test.com |
https://https://webcache.googleusercontent.com | https_cloud.google.com.test.com | https_webcache_googleusercontent_com.test.com |
答案1
得分: 0
以下是翻译好的内容:
这是如何实现的:
func replaceAndPrint() {
src := `
<a href="https://test.com/xxx">link 1</a>
<a href="https://test.com/yyy">link 2</a>
`
r := regexp.MustCompile(`\"https://(test\.com.*)\"`)
result := r.ReplaceAllString(src, `http://sub.$1`)
fmt.Println(result)
}
输出:
<a href=http://sub.test.com/xxx>link 1</a>
<a href=http://sub.test.com/yyy>link 2</a>
解释:
regexp.MustCompile
的参数定义了一个捕获组(在一对括号内)。该捕获组的值在调用r.ReplaceAllString
时通过$1
引用。
更新:
抱歉,之前理解错了示例。
这是更新后的版本:
func replaceAndPrint2() {
src := `
<a href="http://test.com/xxx">link 1</a>
<a href="https://sub1.sub2.test.com/yyy">link 2</a>
`
r := regexp.MustCompile(`(\.|://)([^./]*)`)
replacer := strings.NewReplacer(`://`, `_`, `.`, `_`)
res := r.ReplaceAllStringFunc(src, func(g string) string {
if g == `.com` {
return replacer.Replace(g) + `.proxy.com`
}
return replacer.Replace(g)
})
fmt.Println(res)
}
输出:
<a href="http_test_com.proxy.com/xxx">link 1</a>
<a href="https_sub1_sub2_test_com.proxy.com/yyy">link 2</a>
英文:
Here's how you can do this:
func replaceAndPrint() {
src := `
<a href="https://test.com/xxx">link 1</a>
<a href="https://test.com/yyy">link 2</a>
`
r := regexp.MustCompile("\"https://(test\\.com.*)\"")
result := r.ReplaceAllString(src, "http://sub.$1")
fmt.Println(result)
}
Output:
<a href=http://sub.test.com/xxx>link 1</a>
<a href=http://sub.test.com/yyy>link 2</a>
Explanation:
regexp.MustCompile
's argument defines a capturing group (inside a pair of parentheses). The value of that capturing group is referenced by $1
in the call to r.ReplaceAllString
.
UPDATE:
Sorry, misread the example.
Here's an updated version:
func replaceAndPrint2() {
src := `
<a href="http://test.com/xxx">link 1</a>
<a href="https://sub1.sub2.test.com/yyy">link 2</a>
`
r := regexp.MustCompile("(\\.|://)([^./]*)")
replacer := strings.NewReplacer("://", "_", ".", "_")
res := r.ReplaceAllStringFunc(src, func(g string) string {
if g == ".com" {
return replacer.Replace(g) + ".proxy.com"
}
return replacer.Replace(g)
})
fmt.Println(res)
}
Output:
<a href="http_test_com.proxy.com/xxx">link 1</a>
<a href="https_sub1_sub2_test_com.proxy.com/yyy">link 2</a>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论