使用IndexOf返回匹配的字符与结果的索引。

huangapple go评论71阅读模式
英文:

Returning the matching characters with the result with IndexOf

问题

我有一个大文本文件,想要找到一些数据

Start1234 …数据…End

我需要匹配在Start之前有4个字符(1234只是一个示例),并且在其前有四个空格,然后读取到End。这可以使用IndexOf来返回数据,但我需要将1234与结果一起返回,但它是匹配的一部分,所以不会被包含。有什么办法可以做到这一点?

英文:

I have a large text file and looking to find some data

Start1234 …data…End

I need to match on Start that has 4 chars in front of it (1234 is just an example) and four spaces in front of that then read to End. This would be fine using IndexOf returning data but I need to return the 1234 with the result, but it is part of the match so won’t be included. Any ideas on how I can to do this?

答案1

得分: 2

I need to return the 1234 with the result, but it is part of the match so won't be included.

That's wrong. If you use text.IndexOf("1234") you will get the index before "1234", so it's included:

string text  = "Start1234 …data…End";
string find = "1234";
int index = text.IndexOf(find);
string result = index == -1
    ? null
    : text.Substring(index); // 1234 …data…End
英文:

> I need to return the 1234 with the result, but it is part of the match
> so won’t be included

That's wrong. If you use text.IndexOf("1234") you will get the index before 1234, so it's included:

string text  = "Start1234 …data…End";
string find = "1234";
int index = text.IndexOf(find);
string result = index == - 1
    ? null
    : text.Substring(index); // 1234 …data…End

答案2

得分: 0

你需要找到以单词"Start"开头后跟着四个非空格字符,然后是四个空格的第一个实例,然后提取这四个字符以及四个空格后的所有内容。

如果我的理解正确,那么这个函数应该符合你的需求:

public bool ParseData(string input, out string trailingFour, out string data)
{
	// 正则表达式解释
	// (Start): 匹配单词 "Start"
	// (\S{4}): 匹配任意四个非空格字符
	// ( {4}): 匹配四个空格
	// ([\s\S]*): 匹配字符串中的所有剩余字符,包括换行符
	Regex rx = new Regex(@"(Start)(\S{4})( {4})([\s\S]*)");
	
	// 成功匹配将有 5 个捕获组
    var match = rx.Match(input);
	trailingFour = match.Success ? match.Groups[2].Value : string.Empty;
	data = match.Success ? match.Groups[4].Value : string.Empty;
	
	return match.Success;
}

(Note: The code portion was not translated, as per your request.)

英文:

Correct me if I'm wrong: you need to find the first instance of the word "Start" which is followed by four non-whitespace characters, then four spaces. You then need to extract those four characters and then everything after the four spaces.

If I am correct in my understanding, then this function should do what you want:

public bool ParseData(string input, out string trailingFour, out string data)
{
	//Regex explanation
	//(Start): matches the word Start
	//(\S{4}): matches any four characters that are not whitespace
	//( {4}): matches four spaces
	//([\s\S]*): matches all remaining chacters in the string, including newlines
	Regex rx = new Regex(@"(Start)(\S{4})( {4})([\s\S]*)");
	
	//A successful match will have 5 capture groups
    var match = rx.Match(input);
	trailingFour = match.Success ? match.Groups[2].Value : string.Empty;
	data = match.Success ? match.Groups[4].Value : string.Empty;
	
	return match.Success;
}

答案3

得分: 0

使用正则表达式来解决这个问题,而不是字符串方法。构建一个模式来捕获这样的行:

  • 以字面意义上的Start单词开头。
  • 后跟任意四个字符(.{4})。分组以获取值。
  • 后跟四个空格\s{4}
  • 后跟一些文本(.*?)。分组以获取值,并
  • 以字面意义上的End单词结尾。

将它放在一起:

Start(.{4})\s{4}(.*?)End

示例

var input = "Start1234 dataEnd Start3453    sdfsdfsdfsEnd\nStartSLDE    some data.End";
var pattern = @"Start(.{4})\s{4}(.*?)End";

foreach (Match m in Regex.Matches(input, pattern))
    Console.WriteLine($"{m.Value}, 1st Group: {m.Groups[1].Value}, 2nd Group: {m.Groups[2].Value}");

这将只返回两个匹配项:

Start3453    sdfsdfsdfsEnd, 1st Group: 3453, 2nd Group: sdfsdfsdfs
StartSLDE    some data.End, 1st Group: SLDE, 2nd Group: some data.

如果需要用其他内容替换匹配项并返回一个字符串,可以调用RegEx.Replace方法:

var replaced = Regex.Replace(input, pattern, string.Empty);

regex101

英文:

Use RegEx instead of the string methods for this problem. Form a pattern to capture lines:

  • Start with literally Start word.
  • Followed by any four characters (.{4}). Grouped to get the value.
  • Followed by four whitespaces \s{4}.
  • Followed by some text (.*?). Grouped to get the value, and
  • End with literally End word.

Put it together:

Start(.{4})\s{4}(.*?)End

Example

var input = "Start1234 dataEnd Start3453    sdfsdfsdfsEnd\nStartSLDE    some data.End";
var pattern = @"Start(.{4})\s{4}(.*?)End";

foreach (Match m in Regex.Matches(input, pattern))
    Console.WriteLine($"{m.Value}, 1st Group: {m.Groups[1].Value}, 2nd Group: {m.Groups[2].Value}");

This will return two matches only:

Start3453    sdfsdfsdfsEnd, 1st Group: 3453, 2nd Group: sdfsdfsdfs
StartSLDE    some data.End, 1st Group: SLDE, 2nd Group: some data.

Call RegEx.Replace method If you need to replace the matches with something else and return a new string:

var replaced = Regex.Replace(input, pattern, string.Empty);

regex101

huangapple
  • 本文由 发表于 2023年4月6日 22:41:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75950820.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定