什么是高效的URL匹配和标签提取方法?

huangapple go评论103阅读模式
英文:

What is an efficient way of doing URL matching and tag extraction?

问题

给定两个字符串 a = "/some/{tag}/here"b = "/some/text/here",我想要一个高效的算法来验证 b 是否与 a 定义的模式匹配,并且如果匹配的话,提取出 b 中相应的部分到一个变量中(即:tag = "text")。

C 或 Go 的实现也可以,但伪代码也可以。

英文:

Given the two strings a = "/some/{tag}/here" and b = "/some/text/here" I would like an efficient algorithm to verify if b matches the pattern defined by a and if it does to extract the corresponding part of b in a variable (i.e.:tag = "text").

Implementations in C or Go are also welcome but pseudocode will do just fine.

答案1

得分: 3

阅读关于Knuth-Morris-Pratt字符串搜索算法的内容。应该会给你提供所需的所有信息,包括伪代码。

英文:

Read about the Knuth–Morris–Pratt string searching algorithm. Should give you all you need including pseudo code.

答案2

得分: 2

许多优秀的正则表达式工具包都可以做到这一点,但你可能需要改变模式的语法。例如,这是Python版本的示例:

>>> import re
>>> a = re.compile("/some/(?P<pattern>.+)/here")
>>> b = "/some/text/here"
>>> a.match(b).group("pattern")
'text'
英文:

Many good regex toolkits can do this, but you might have to change the syntax of patterns. E.g., here's the Python version:

&gt;&gt;&gt; import re
&gt;&gt;&gt; a = re.compile(&quot;/some/(?P&lt;pattern&gt;.+)/here&quot;)
&gt;&gt;&gt; b = &quot;/some/text/here&quot;
&gt;&gt;&gt; a.match(b).group(&quot;pattern&quot;)
&#39;text&#39;

答案3

得分: 2

也许你可以分割a

string[] array1 = a.Split('/');
string[] array2 = a.Split('/');
bool isEqual = (array1[2] == array2[2]);
英文:

Maybe you could split a

string[] array1 = a.Split(&#39;/&#39;);
string[] array2 = a.Split(&#39;/&#39;);
bool isEqual = (array1[2] == array2[2]);

答案4

得分: 1

Go回答:Go标准库提供了一个URL解析器正则表达式包来帮助你。Go不允许在运行时给变量命名,所以将答案作为tag = "text"返回并不太合理。相反,你可能希望将结果作为一个结构体返回,或者在一个映射中收集多个结果。一个大致的概述可能是这样的:

  1. 编译一个与带有大括号的标签语法匹配的正则表达式。在程序加载时只需执行一次。我们将其称为tagRE。
  2. 将tagRE应用于模式"a"。匹配的结果将是要匹配的URL的部分和标签的名称。(如果匹配失败,则模式"a"无效。)
  3. 使用这些结果构建并编译另一个正则表达式,该正则表达式在实际的URL中匹配该模式。让我们称之为aRE。只要您认为将来可能需要匹配此模式,就保留这个正则表达式。重复编译它的工作是没有意义的。
  4. 根据需要重复执行步骤2和3,以匹配其他模式,或者根据模式在您的程序中变得可用。可以将这些模式收集在一个切片或映射中。我猜您还希望将这些与应用程序中的其他有用内容关联起来,例如在找到匹配时执行的一些代码。
  5. 当您有一个要匹配的实际URL时,您可能希望首先使用URL包解析它,以分离出URL路径。
  6. 将aRE(或切片中的所有正则表达式)应用于路径,看看是否有匹配。如果有匹配,返回一个包含来自a的标签名称和aRE匹配的路径部分的结果。您可以通过创建一个结果结构体或添加到结果映射中来实现这一点。

显示构建正则表达式的代码:

package main

import (
	"fmt"
	"regexp"
)

var a = "/some/{tag}/here/{and}/there"
var aPath = `/some/bread/here/jam/there`

func main() {
	tagPat := regexp.MustCompile("([^{]*){([^}]+)}")
	aMatch := tagPat.FindAllStringSubmatch(a, -1)
	if aMatch == nil {
		fmt.Println("bad pattern")
		return
	}
	aRE := ""
	matchLen := 0
	for _, m := range aMatch {
		if m[1] > "" {
			aRE += `\Q` + m[1] + `\E`
		}
		aRE += "(?P<" + m[2] + ">.*)"
		matchLen += len(m[0])
	}
	if matchLen < len(a) {
		aRE += `\Q` + a[matchLen:] + `\E`
	}
	aPat := regexp.MustCompile(aRE)
	pathMatch := aPat.FindStringSubmatch(aPath)
	if pathMatch == nil {
		fmt.Println("url doesn't match")
		return
	}
	for tx, tag := range aPat.SubexpNames()[1:] {
		fmt.Println(tag, "=", pathMatch[tx+1])
	}
}

输出:

tag = bread
and = jam

英文:

Go answer: The Go standard library has a URL parser and regular expression package to help you. Go does not let you name variables at runtime, so getting your answer as tag = &quot;text&quot; doesn't quite make sense. Instead you might want to return a result as a struct, or perhaps collect multiple results in a map. An outline might go something like,

  1. Compile a regexp that matches your tag syntax with the braces. You do this once when the program loads. Lets call this tagRE.
  2. Apply tagRE to pattern "a". The results of this match will be the parts of the URL to match, and the name of the tag. (If the match fails, pattern "a" is invalid.)
  3. Use the results to construct and compile another regexp that matches that pattern in a real url. Let's call this aRE. Hold on to this regexp as long as you think you might need to match this pattern in the future. There's no sense in repeating the work of compiling it.
  4. Maybe repeat steps 2 and 3 as needed for other patterns as needed, or maybe as patterns become available to your program. Maybe collect these in a slice or map or something. I'm guessing you will also want to associate these with something else useful in your application, like some code to execute when a match is found.
  5. When you have a real url you want to match, You probably want to parse it first with the URL package to separate out the URL path.
  6. Apply aRE (or all regexps in the slice) to the path and see if you have a match. If so, return a result containing the tag name from a and the part of the path that aRE matched. You do this by creating a result struct or adding to your result map.

Code showing construction of regular expressions:

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
)

var a = &quot;/some/{tag}/here/{and}/there&quot;
var aPath = `/some/bread/here/jam/there`

func main() {
	tagPat := regexp.MustCompile(&quot;([^{]*){([^}]+)}&quot;)
	aMatch := tagPat.FindAllStringSubmatch(a, -1)
	if aMatch == nil {
		fmt.Println(&quot;bad pattern&quot;)
		return
	}
	aRE := &quot;&quot;
	matchLen := 0
	for _, m := range aMatch {
		if m[1] &gt; &quot;&quot; {
			aRE += `\Q` + m[1] + `\E`
		}
		aRE += &quot;(?P&lt;&quot; + m[2] + &quot;&gt;.*)&quot;
		matchLen += len(m[0])
	}
	if matchLen &lt; len(a) {
		aRE += `\Q` + a[matchLen:] + `\E`
	}
	aPat := regexp.MustCompile(aRE)
	pathMatch := aPat.FindStringSubmatch(aPath)
	if pathMatch == nil {
		fmt.Println(&quot;url doesn&#39;t match&quot;)
		return
	}
	for tx, tag := range aPat.SubexpNames()[1:] {
		fmt.Println(tag, &quot;=&quot;, pathMatch[tx+1])
	}
}

Output:

> tag = bread
> and = jam

答案5

得分: 1

所以你有一个形式为/some/{tag}/here的模式字符串,并且你想确定是否有其他字符串与该模式匹配。如果匹配,那么你想提取{tag}部分。

我认为你可以将模式字符串分成三个部分:

"/some/"
"{tag}"
"/here"

现在,使用标准的C比较函数(我想到的是类似strncmp的函数),检查字符串是否以"/some/"开头并以"/here"结尾。如果是这样,那么你可以很容易地找到标签字符串的起始和结束位置。起始位置是:

stringBegin = s + strlen("/some/");
length = strlen(s) - strlen("/some/") - strlen("/here");

然后,只需简单地复制出该子字符串。

当然,我的示例使用的是常量字符串。但是,如果你可以轻松地分离出组件,那么你可以用变量替换常量。

英文:

So you have a pattern string of the form /some/{tag}/here, and you want to determine if some other string matches that pattern. If it does, then you want to extract the {tag} portion.

Seems to me that you could split your pattern string into three parts:

&quot;/some/&quot;
&quot;{tag}&quot;
&quot;/here&quot;

Now, using standard C comparison functions (I'm thinking something like strncmp), check to see if the string starts with &quot;/some/&quot; and ends with &quot;/here&quot;. If it does, then you can easily find the beginning and end of the tag string. The beginning is:

stringBegin = s + strlen(&quot;/some/&quot;);
length = strlen(s) - strlen(&quot;/some/&quot;) - strlen(&quot;/here&quot;);

Then it's a simple matter of copying out that substring.

Of course my example is using constant strings. But if you can easily split out the components, then you can substitute variables for the constants.

答案6

得分: 0

我假设你的标签中不能有斜杠。如果不是这样,我的解决方案将无法正常工作,需要进行相当大的修改。

如果上述情况成立,你可以首先将路径分词为一个列表,就像user1288160在他的答案中展示的那样。我的解决方案将使用Go语言编写。

path := strings.Split(url, "/")

然后,你可以使用一个简单的状态机来处理这些标记。

type urlParser func([]string) (urlParser, []string, error)

// 为不同的标记定义处理程序,执行适当的操作
var parseMap map[string]urlParser

var startParse = func(ps []string) (urlParser, []string, error) {
   switch  {
   case len(ps) == 0:
      return nil, nil, errors.New("路径结束")
   case len(ps) == 1:
     return parseMap[ps[0]], nil, nil
   case len(ps) > 1:
     return parseMap[ps[0]], ps[1:], nil
   }
}

p := startParse
var err error
for {
   // 获取状态机中的下一步,未解析的路径部分和任何错误。
   next, rst, pErr := p(path)
   // 错误意味着我们已经完成。
   if pErr != nil {
     break;
   }
   // 为下一次解析循环设置。
   p = next
   path = rst
   err = pErr
}

你的urlParsers将是闭包,用于将某个变量填充为与之匹配的内容。

英文:

I'm assuming your tags can't have slashes in them. If that is not so my solution won't work without
considerable modification.

If the above holds true though then you can first tokenize your path into a list like user1288160 shows in his answser. My solution will be in go.

path := strings.Split(url, &quot;/&quot;)

Then you can use a simple state machine to process the tokens.

type urlParser func([]string) (urlParser, []string, error)

// define handlers for the various tokens that do appropriate things
var parseMap map[string]urlParser

var startParse = func(ps []string) (urlParser, []string, error) {
   switch  {
   case len(ps) == 0:
      return nil, nil, errors.New(&quot;End Of Path&quot;)
   case len(ps) == 1:
     return parseMap[ps[0]], nil, nil
   case len(ps) &gt; 1:
     return parseMap[ps[0]], ps[1:], nil
   }
}

p := startParse
var err error
for {
   // get the next step in the state machine, unparsed portion of the path
   // and any errors.
   next, rst, pErr := p(path)
   // an error means we are done.
   if pErr != nil {
     break;
   }
   // set up for our next iteration of the parse loop.
   p = next
   path = rst
   err = pErr
}

Your urlParsers will be closures that populate some variable with whatever you matched against.

答案7

得分: 0

为了帮助您,我们需要一些背景信息。例如,“模式”由什么组成,数字?字母?数字和字母?允许使用哪些字符?

第一个场景:假设路径目标的位置固定,您可以这样做:

C代码:

char * string = "/some/text/here";
char * path;
char * b = "text";
	
if(strtok(strdup(string), "/")) {
	path = strtok(NULL, "/");
	if(!strcmp(b, path)) {
		/* 相等。做些什么.. */
	} else {
		/* ... */
	}
} else { 
	printf("未找到标签。\n");
}

第二个场景:

假设您只知道路径目标的前任,您可以这样做:

C代码:

char * string = "/some/text/here";

char *cpath,  		    /* 当前路径 */ 
	 *ppath   = NULL,   /* 前任路径 */
	 *ptpath  = "some", /* 前任路径目标 */
	 *pathcmp = "text"; /* 要比较的路径 */ 

cpath = strtok(strdup(string), "/");

 while(cpath) { 
	ppath = cpath; 
	cpath = strtok(NULL, "/");
	
	if(ppath && ptpath && !strcmp(ppath, ptpath)) {
		if(!strcmp(cpath, pathcmp)) {
			/* 相等。 */
		} else {
			/* ... */
		}

		break;
	}
}

像这样的简单情况,可以避免使用正则表达式和URI解析(当然是在良好的意义上)。

希望这对您有所帮助。

英文:

For we can help it,we need background information. For example, what compose the "pattern", numbers? letters? number and letters? which characters are allowed?

First scenery: Assuming that the position of path target is fix, you can do something like this:

C code:

char * string = &quot;/some/text/here&quot;;
char * path;
char * b = &quot;text&quot;;
	
if(strtok(strdup(string), &quot;/&quot;)) {
	path = strtok(NULL, &quot;/&quot;);
	if(!strcmp(b, path)) {
		/* Are equals. Do something.. */
	} else {
		/* ... */
	}
} else { 
	printf(&quot;Not found tag.\n&quot;);
}

Second scenery:

Assuming that the you know only the predecessor of path target, you can do something like this:

C code:

char * string = &quot;/some/text/here&quot;;

char *cpath,  		    /* Current path */ 
	 *ppath   = NULL,   /* Predecessor path */
	 *ptpath  = &quot;some&quot;, /* Predecessor path target */
	 *pathcmp = &quot;text&quot;; /* Path to compare */ 

cpath = strtok(strdup(string), &quot;/&quot;);

 while(cpath) { 
	ppath = cpath; 
	cpath = strtok(NULL, &quot;/&quot;);
	
	if(ppath &amp;&amp; ptpath &amp;&amp; !strcmp(ppath, ptpath)) {
		if(!strcmp(cpath, pathcmp)) {
			/* Are equals. */
		} else {
			/* ... */
		}

		break;
	}
}

Very simple cases like this, where can escape from regular expression and URI parsing(on good sense, of course).

I hope this help you.

huangapple
  • 本文由 发表于 2012年4月15日 22:56:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/10163118.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定