How to create parametrized regex (by terms of C#) which matches strings delimited by custom multicharacter delimiter?

huangapple go评论134阅读模式

How to create parametrized regex (by terms of C#) which matches strings delimited by custom multicharacter delimiter?


I can help you translate the provided text. Here's the translation:

所以,我想在文本中查找字符串。文本可以包含多行。字符串可以由自定义分隔符分隔 - 这应该是可参数化的。文本中可以包含多个字符串,甚至在同一行中。例如:如果分隔符是(三个双引号):""",那么在这个文本中:

> lorem ipsum """findthis""" "but not this" 'nor this'
> """anotherstringtofind"""
> ""blabla"" """yet another""""""text to find"""

它应该找到:findthisanotherstringtofindyet anothertext to find



public static MatchCollection FindString(this string input, char delimiter, RegexOptions regexOptions = RegexOptions.Multiline)
    var regexString = string.Format("[{0}](([^{0}])*)[{0}]", delimiter);
    var rx = new Regex(regexString, regexOptions);

    MatchCollection matches = rx.Matches(input);

    return matches;




var inputText = "??abc?? ??def?? ??xyz??";

var matches = inputText.FindString("??", RegexOptions.Singleline);

Assert.Equal(3, matches.Count);



So, I want to find strings in a text. The text can contain multiple lines. The strings can be delimited by custom delimiters - this should be parameterized. There can be multiple strings in the text, even in one line. For example: if the delimiter is (three double quatation marks): """ then in this text:

> lorem ipsum """findthis""" "but not this" 'nor this'
> """anotherstringtofind"""
> ""blabla"" """yet another""""""text to find"""

It should find: findthis, anotherstringtofind, yet another, text to find.
(Notice, that the delimiters are not present in the matched strings, although I can remove them using C#, if needed.)

I can do a similar thing, just for one character delimiters:
with regex: "[{0}](([^{0}])*)[{0}]"

Like this:

public static MatchCollection FindString(this string input, char delimeter, RegexOptions regexOptions = RegexOptions.Multiline)
    var regexString = string.Format("[{0}](([^{0}])*)[{0}]", delimeter);
    var rx = new Regex(regexString, regexOptions);

    MatchCollection matches = rx.Matches(input);

    return matches;

I guess, the solution would use look-ahead operators, but I could not figure out how to combine it with something, which has similar effect like [^] in case of single characters. Is it even possible to "negate" a whole sequence of characters (to not put them into the matches)?

I think this question is similar, but I'm not familiar with Python.

Some clarification:
My expectation is to use each and delimiter pair exactly once. So, e.g. this pass should pass:

            var inputText = "??abc?? ??def?? ??xyz??";

            var matches = inputText.FindString("??", RegexOptions.Singleline);

            Assert.Equal(3, matches.Count);

Is it possible to solve this in C# using regex?
Thank you in advance!


得分: 1





public static MatchCollection FindString(string input, string delimiter, RegexOptions regexOptions = RegexOptions.Multiline)
    string pattern = string.Format("{0}(.*?){0}", Regex.Escape(delimiter));
    var rx = new Regex(pattern, regexOptions);
    return rx.Matches(input);




You can use lazy quantifier instead of negated character class. In you example with """ it should lead to regex like """(.*?)"""

Also, notice that your current attempt incorrectly uses character classes for delimiters, as ["""] is equivalent to ["], and in turn to simple ". Use your delimiter as is, without any additional wrappers.

But don't forget to escape your delimiter before use in regex. So, that if you have delimiter like [] in regex it should be \[\].

Your method would look like this:

public static MatchCollection FindString(string input, string delimiter, RegexOptions regexOptions = RegexOptions.Multiline)
    string pattern = string.Format("{0}(.*?){0}", Regex.Escape(delimiter));
    var rx = new Regex(pattern, regexOptions);
    return rx.Matches(input);

>Is it even possible to "negate" a whole sequence of characters

Yes, it is possible: (?:(?!foo).)+ can be used to match something like this. Or for your example """(?:(?!""").)*""". But it would be way worse performance-wise comparing to simple lazy quantifier.

  • 本文由 发表于 2023年6月30日 04:51:30
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
