奇怪的Go正则表达式不匹配HTTP请求路由时

huangapple go评论94阅读模式
英文:

Weird Go Regular Expression mismatch when routing http requests

问题

我正在通过Go web服务器提供Web内容,并使用正则表达式将处理程序与请求路径匹配。我注意到了一个非常奇怪的行为,我将其简化为下面的测试代码。
基本上,任何8个字母/数字的组合都应该被一个处理程序捕获,而其他特定的请求路径应该被其他处理程序捕获。这在8个字母/数字路径的情况下工作得很好,如果字母序列以小写字母'c'结尾,则匹配会被第一个处理程序捕获。其他任何字母都可以正常工作。

以下代码可以粘贴到文件中并运行。它将在localhost:8080上提供服务。我提供了一些请求链接来演示问题。

package main

import (
	"fmt"
	"net/http"
	"regexp"
)

// This is the handler when passing a string of 8 characters ([])
func runTest(w http.ResponseWriter, r *http.Request) {
	path := r.URL.Path[1:]
	fmt.Fprintf(w, path)
}

func runTest2(w http.ResponseWriter, r *http.Request) {
	path := "Reg ex for: .[(css|jpg|png|js|ttf|ico)]$"
	fmt.Fprintf(w, path)
}

func runTest3(w http.ResponseWriter, r *http.Request) {
	path := "Reg ex for: /all$"
	fmt.Fprintf(w, path)
}

// Regular expression handler
type route struct {
	pattern *regexp.Regexp
	handler http.Handler
}

type RegexpHandler struct {
	routes []*route
}

func (h *RegexpHandler) Handler(pattern *regexp.Regexp, handler http.Handler) {
	h.routes = append(h.routes, &route{pattern, handler})
}

func (h *RegexpHandler) HandleFunc(pattern *regexp.Regexp, handler func(http.ResponseWriter, *http.Request)) {
	h.routes = append(h.routes, &route{pattern, http.HandlerFunc(handler)})
}

func (h *RegexpHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	for _, route := range h.routes {
		if route.pattern.MatchString(r.URL.Path) {
			route.handler.ServeHTTP(w, r)
			return
		}
	}
	http.NotFound(w, r)

}

func main() {
	handler := &RegexpHandler{}
	handler.HandleFunc(regexp.MustCompile(`.[(css|jpg|png|js|ttf|ico)]$`), runTest2)
	handler.HandleFunc(regexp.MustCompile("^/all$"), runTest3)
	handler.HandleFunc(regexp.MustCompile("^/[A-Z0-9a-z]{8}$"), runTest)
	http.ListenAndServe(":8080", handler)
}

这个请求被第二个处理程序捕获(runTest3):

http://localhost:8080/all

这个请求被第三个处理程序捕获(runTest),它打印出URL的路径部分:

http://localhost:8080/yr22FBMD.

然而,这个请求被第一个处理程序捕获(注意它以小写字母c结尾):

http://localhost:8080/yr22FBMc

有什么想法吗?这真的非常奇怪!

英文:

I'm serving web content via a Go web server and using regular expressions to match handlers to request paths. I've noticed a really strange behaviour I've diluted to this test code below.
Basically, any 8 letter/number combination is meant to be caught by a handler, while other specific request paths are meant to be caught by other handlers. This works great by in the case of the 8 letter/number path the match gets picked up by the first handler if the letter sequence ends in a lower case 'c'. Any other letter at the end works fine.

The code below can be pasted into a file and run. It will serve on localhost:8080. I've provided a few request links to demonstrate the problem.

package main
import (
"fmt"
"net/http" 
"regexp"
) 
// This is the handler when passing a string of 8 characters ([])
func runTest(w http.ResponseWriter, r *http.Request) {
path := r.URL.Path[1:]
fmt.Fprintf(w, path)
} 
func runTest2(w http.ResponseWriter, r *http.Request) {
path := "Reg ex for: .[(css|jpg|png|js|ttf|ico)]$" 
fmt.Fprintf(w, path)
} 
func runTest3(w http.ResponseWriter, r *http.Request) {
path := "Reg ex for: /all$" 
fmt.Fprintf(w, path)
} 
// Regular expression handler
type route struct {
pattern *regexp.Regexp
handler http.Handler
}
type RegexpHandler struct {
routes []*route
}
func (h *RegexpHandler) Handler(pattern *regexp.Regexp, handler http.Handler) {
h.routes = append(h.routes, &route{pattern, handler})
}
func (h *RegexpHandler) HandleFunc(pattern *regexp.Regexp, handler func(http.ResponseWriter, *http.Request)) {
h.routes = append(h.routes, &route{pattern, http.HandlerFunc(handler)})
}
func (h *RegexpHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
for _, route := range h.routes {
if route.pattern.MatchString(r.URL.Path) {
route.handler.ServeHTTP(w, r)
return
}
}
http.NotFound(w, r)
}
func main() {
handler := &RegexpHandler{} 
handler.HandleFunc(regexp.MustCompile(`.[(css|jpg|png|js|ttf|ico)]$`), runTest2)
handler.HandleFunc(regexp.MustCompile("^/all$"), runTest3) 
handler.HandleFunc(regexp.MustCompile("^/[A-Z0-9a-z]{8}$"), runTest)  
http.ListenAndServe(":8080", handler)
}

This request gets picked up by the second handler (runTest3):

http://localhost:8080/all

This request gets picked up by the third handler (runTest) which prints out the path portion of the url:

http://localhost:8080/yr22FBMD.

This request however, gets picked up by the first handler (note its ending with a lower case c):

http://localhost:8080/yr22FBMc

Any ideas? This is extremely weird!

答案1

得分: 10

你在runTest2中的括号内有扩展名。这使得它成为一个字符类,所以你的正则表达式的意思是,“匹配任何以'(', 'c', 's', '|', 'j', 'p', 'g', 'n', 't', 'f', 'i', 'o'或')'结尾的行。

你只需要移除方括号,并且我认为你的意思是要转义开头的句点。

"\.(css|jpg|png|js|ttf|ico)$"
英文:

You have the extensions inside brackets in runTest2. This makes it a character class so your regex is saying, "match any line with '(' 'c', 's', '|', 'j', 'p', 'g', 'n', 't', 'f', 'i', 'o', or ')' as the last character.

You just need to remove the sqaure brackets, and I think you mean to escape the period at the beginning.

"\.(css|jpg|png|js|ttf|ico)$"

huangapple
  • 本文由 发表于 2013年5月18日 04:24:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/16617312.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定