英文:
Weird Go Regular Expression mismatch when routing http requests
问题
我正在通过Go web服务器提供Web内容,并使用正则表达式将处理程序与请求路径匹配。我注意到了一个非常奇怪的行为,我将其简化为下面的测试代码。
基本上,任何8个字母/数字的组合都应该被一个处理程序捕获,而其他特定的请求路径应该被其他处理程序捕获。这在8个字母/数字路径的情况下工作得很好,如果字母序列以小写字母'c'结尾,则匹配会被第一个处理程序捕获。其他任何字母都可以正常工作。
以下代码可以粘贴到文件中并运行。它将在localhost:8080上提供服务。我提供了一些请求链接来演示问题。
package main
import (
"fmt"
"net/http"
"regexp"
)
// This is the handler when passing a string of 8 characters ([])
func runTest(w http.ResponseWriter, r *http.Request) {
path := r.URL.Path[1:]
fmt.Fprintf(w, path)
}
func runTest2(w http.ResponseWriter, r *http.Request) {
path := "Reg ex for: .[(css|jpg|png|js|ttf|ico)]$"
fmt.Fprintf(w, path)
}
func runTest3(w http.ResponseWriter, r *http.Request) {
path := "Reg ex for: /all$"
fmt.Fprintf(w, path)
}
// Regular expression handler
type route struct {
pattern *regexp.Regexp
handler http.Handler
}
type RegexpHandler struct {
routes []*route
}
func (h *RegexpHandler) Handler(pattern *regexp.Regexp, handler http.Handler) {
h.routes = append(h.routes, &route{pattern, handler})
}
func (h *RegexpHandler) HandleFunc(pattern *regexp.Regexp, handler func(http.ResponseWriter, *http.Request)) {
h.routes = append(h.routes, &route{pattern, http.HandlerFunc(handler)})
}
func (h *RegexpHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
for _, route := range h.routes {
if route.pattern.MatchString(r.URL.Path) {
route.handler.ServeHTTP(w, r)
return
}
}
http.NotFound(w, r)
}
func main() {
handler := &RegexpHandler{}
handler.HandleFunc(regexp.MustCompile(`.[(css|jpg|png|js|ttf|ico)]$`), runTest2)
handler.HandleFunc(regexp.MustCompile("^/all$"), runTest3)
handler.HandleFunc(regexp.MustCompile("^/[A-Z0-9a-z]{8}$"), runTest)
http.ListenAndServe(":8080", handler)
}
这个请求被第二个处理程序捕获(runTest3):
http://localhost:8080/all
这个请求被第三个处理程序捕获(runTest),它打印出URL的路径部分:
http://localhost:8080/yr22FBMD.
然而,这个请求被第一个处理程序捕获(注意它以小写字母c结尾):
http://localhost:8080/yr22FBMc
有什么想法吗?这真的非常奇怪!
英文:
I'm serving web content via a Go web server and using regular expressions to match handlers to request paths. I've noticed a really strange behaviour I've diluted to this test code below.
Basically, any 8 letter/number combination is meant to be caught by a handler, while other specific request paths are meant to be caught by other handlers. This works great by in the case of the 8 letter/number path the match gets picked up by the first handler if the letter sequence ends in a lower case 'c'. Any other letter at the end works fine.
The code below can be pasted into a file and run. It will serve on localhost:8080. I've provided a few request links to demonstrate the problem.
package main
import (
"fmt"
"net/http"
"regexp"
)
// This is the handler when passing a string of 8 characters ([])
func runTest(w http.ResponseWriter, r *http.Request) {
path := r.URL.Path[1:]
fmt.Fprintf(w, path)
}
func runTest2(w http.ResponseWriter, r *http.Request) {
path := "Reg ex for: .[(css|jpg|png|js|ttf|ico)]$"
fmt.Fprintf(w, path)
}
func runTest3(w http.ResponseWriter, r *http.Request) {
path := "Reg ex for: /all$"
fmt.Fprintf(w, path)
}
// Regular expression handler
type route struct {
pattern *regexp.Regexp
handler http.Handler
}
type RegexpHandler struct {
routes []*route
}
func (h *RegexpHandler) Handler(pattern *regexp.Regexp, handler http.Handler) {
h.routes = append(h.routes, &route{pattern, handler})
}
func (h *RegexpHandler) HandleFunc(pattern *regexp.Regexp, handler func(http.ResponseWriter, *http.Request)) {
h.routes = append(h.routes, &route{pattern, http.HandlerFunc(handler)})
}
func (h *RegexpHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
for _, route := range h.routes {
if route.pattern.MatchString(r.URL.Path) {
route.handler.ServeHTTP(w, r)
return
}
}
http.NotFound(w, r)
}
func main() {
handler := &RegexpHandler{}
handler.HandleFunc(regexp.MustCompile(`.[(css|jpg|png|js|ttf|ico)]$`), runTest2)
handler.HandleFunc(regexp.MustCompile("^/all$"), runTest3)
handler.HandleFunc(regexp.MustCompile("^/[A-Z0-9a-z]{8}$"), runTest)
http.ListenAndServe(":8080", handler)
}
This request gets picked up by the second handler (runTest3):
http://localhost:8080/all
This request gets picked up by the third handler (runTest) which prints out the path portion of the url:
http://localhost:8080/yr22FBMD.
This request however, gets picked up by the first handler (note its ending with a lower case c):
http://localhost:8080/yr22FBMc
Any ideas? This is extremely weird!
答案1
得分: 10
你在runTest2中的括号内有扩展名。这使得它成为一个字符类,所以你的正则表达式的意思是,“匹配任何以'(', 'c', 's', '|', 'j', 'p', 'g', 'n', 't', 'f', 'i', 'o'或')'结尾的行。
你只需要移除方括号,并且我认为你的意思是要转义开头的句点。
"\.(css|jpg|png|js|ttf|ico)$"
英文:
You have the extensions inside brackets in runTest2. This makes it a character class so your regex is saying, "match any line with '(' 'c', 's', '|', 'j', 'p', 'g', 'n', 't', 'f', 'i', 'o', or ')' as the last character.
You just need to remove the sqaure brackets, and I think you mean to escape the period at the beginning.
"\.(css|jpg|png|js|ttf|ico)$"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论