英文:
How to create parametrized regex (by terms of C#) which matches strings delimited by custom multicharacter delimiter?
问题
I can help you translate the provided text. Here's the translation:
所以,我想在文本中查找字符串。文本可以包含多行。字符串可以由自定义分隔符分隔 - 这应该是可参数化的。文本中可以包含多个字符串,甚至在同一行中。例如:如果分隔符是(三个双引号):"""
,那么在这个文本中:
> lorem ipsum """findthis""" "but not this" 'nor this'
> """anotherstringtofind"""
>
> ""blabla"" """yet another""""""text to find"""
它应该找到:findthis,anotherstringtofind,yet another,text to find。
(请注意,分隔符不在匹配的字符串中,尽管如果需要,我可以使用C#将它们删除。)
我可以做类似的事情,只是对于单个字符分隔符:
使用正则表达式:"[{0}](([^{0}])*)[{0}]"
就像这样:
public static MatchCollection FindString(this string input, char delimiter, RegexOptions regexOptions = RegexOptions.Multiline)
{
var regexString = string.Format("[{0}](([^{0}])*)[{0}]", delimiter);
var rx = new Regex(regexString, regexOptions);
MatchCollection matches = rx.Matches(input);
return matches;
}
我猜,解决方案将使用前瞻运算符,但我无法弄清楚如何将其与在单个字符情况下具有类似效果的[^]
相结合使用。是否有可能“否定”整个字符序列(以便不将它们放入匹配项中)?
我认为这个问题类似,但我不熟悉Python。
一些澄清:
我的期望是每个分隔符对都要使用一次。因此,例如,这个测试应该通过:
var inputText = "??abc?? ??def?? ??xyz??";
var matches = inputText.FindString("??", RegexOptions.Singleline);
Assert.Equal(3, matches.Count);
是否可以在C#中使用正则表达式解决这个问题?
提前谢谢!
英文:
So, I want to find strings in a text. The text can contain multiple lines. The strings can be delimited by custom delimiters - this should be parameterized. There can be multiple strings in the text, even in one line. For example: if the delimiter is (three double quatation marks): """
then in this text:
> lorem ipsum """findthis""" "but not this" 'nor this'
> """anotherstringtofind"""
>
> ""blabla"" """yet another""""""text to find"""
It should find: findthis, anotherstringtofind, yet another, text to find.
(Notice, that the delimiters are not present in the matched strings, although I can remove them using C#, if needed.)
I can do a similar thing, just for one character delimiters:
with regex: "[{0}](([^{0}])*)[{0}]"
Like this:
public static MatchCollection FindString(this string input, char delimeter, RegexOptions regexOptions = RegexOptions.Multiline)
{
var regexString = string.Format("[{0}](([^{0}])*)[{0}]", delimeter);
var rx = new Regex(regexString, regexOptions);
MatchCollection matches = rx.Matches(input);
return matches;
}
I guess, the solution would use look-ahead operators, but I could not figure out how to combine it with something, which has similar effect like [^]
in case of single characters. Is it even possible to "negate" a whole sequence of characters (to not put them into the matches)?
I think this question is similar, but I'm not familiar with Python.
Some clarification:
My expectation is to use each and delimiter pair exactly once. So, e.g. this pass should pass:
var inputText = "??abc?? ??def?? ??xyz??";
var matches = inputText.FindString("??", RegexOptions.Singleline);
Assert.Equal(3, matches.Count);
Is it possible to solve this in C# using regex?
Thank you in advance!
答案1
得分: 1
你可以使用懒惰量词来替代否定字符类。在你的示例中,使用"""应该会导致正则表达式如下:"""(.*?)"""
此外,请注意,你当前的尝试错误地使用字符类作为分隔符,因为["""]
等同于["]
,进而等同于简单的"
。在正则表达式中使用你的分隔符时,请直接使用它,不需要任何额外的包装。
但是,在使用正则表达式之前,不要忘记转义你的分隔符。因此,如果你的分隔符在正则表达式中是[]
,那么它应该写成\[\]
。
你的方法应该像这样:
public static MatchCollection FindString(string input, string delimiter, RegexOptions regexOptions = RegexOptions.Multiline)
{
string pattern = string.Format("{0}(.*?){0}", Regex.Escape(delimiter));
var rx = new Regex(pattern, regexOptions);
return rx.Matches(input);
}
>甚至可以“否定”整个字符序列吗?
是的,是可能的:(?:(?!foo).)+
可以用来匹配类似这样的内容。或者对于你的示例,可以使用"""(?:(?!""").)*"""
。但从性能角度来看,与简单的懒惰量词相比,性能会差很多。
英文:
You can use lazy quantifier instead of negated character class. In you example with """ it should lead to regex like """(.*?)"""
Also, notice that your current attempt incorrectly uses character classes for delimiters, as ["""]
is equivalent to ["]
, and in turn to simple "
. Use your delimiter as is, without any additional wrappers.
But don't forget to escape your delimiter before use in regex. So, that if you have delimiter like []
in regex it should be \[\]
.
Your method would look like this:
public static MatchCollection FindString(string input, string delimiter, RegexOptions regexOptions = RegexOptions.Multiline)
{
string pattern = string.Format("{0}(.*?){0}", Regex.Escape(delimiter));
var rx = new Regex(pattern, regexOptions);
return rx.Matches(input);
}
>Is it even possible to "negate" a whole sequence of characters
Yes, it is possible: (?:(?!foo).)+
can be used to match something like this. Or for your example """(?:(?!""").)*"""
. But it would be way worse performance-wise comparing to simple lazy quantifier.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论