英文:
Returning the matching characters with the result with IndexOf
问题
我有一个大文本文件,想要找到一些数据
Start1234 …数据…End
我需要匹配在Start之前有4个字符(1234只是一个示例),并且在其前有四个空格,然后读取到End。这可以使用IndexOf来返回数据,但我需要将1234与结果一起返回,但它是匹配的一部分,所以不会被包含。有什么办法可以做到这一点?
英文:
I have a large text file and looking to find some data
Start1234 …data…End
I need to match on Start that has 4 chars in front of it (1234 is just an example) and four spaces in front of that then read to End. This would be fine using IndexOf returning data but I need to return the 1234 with the result, but it is part of the match so won’t be included. Any ideas on how I can to do this?
答案1
得分: 2
I need to return the 1234 with the result, but it is part of the match so won't be included.
That's wrong. If you use text.IndexOf("1234")
you will get the index before "1234", so it's included:
string text = "Start1234 …data…End";
string find = "1234";
int index = text.IndexOf(find);
string result = index == -1
? null
: text.Substring(index); // 1234 …data…End
英文:
> I need to return the 1234 with the result, but it is part of the match
> so won’t be included
That's wrong. If you use text.IndexOf("1234")
you will get the index before 1234
, so it's included:
string text = "Start1234 …data…End";
string find = "1234";
int index = text.IndexOf(find);
string result = index == - 1
? null
: text.Substring(index); // 1234 …data…End
答案2
得分: 0
你需要找到以单词"Start"开头后跟着四个非空格字符,然后是四个空格的第一个实例,然后提取这四个字符以及四个空格后的所有内容。
如果我的理解正确,那么这个函数应该符合你的需求:
public bool ParseData(string input, out string trailingFour, out string data)
{
// 正则表达式解释
// (Start): 匹配单词 "Start"
// (\S{4}): 匹配任意四个非空格字符
// ( {4}): 匹配四个空格
// ([\s\S]*): 匹配字符串中的所有剩余字符,包括换行符
Regex rx = new Regex(@"(Start)(\S{4})( {4})([\s\S]*)");
// 成功匹配将有 5 个捕获组
var match = rx.Match(input);
trailingFour = match.Success ? match.Groups[2].Value : string.Empty;
data = match.Success ? match.Groups[4].Value : string.Empty;
return match.Success;
}
(Note: The code portion was not translated, as per your request.)
英文:
Correct me if I'm wrong: you need to find the first instance of the word "Start" which is followed by four non-whitespace characters, then four spaces. You then need to extract those four characters and then everything after the four spaces.
If I am correct in my understanding, then this function should do what you want:
public bool ParseData(string input, out string trailingFour, out string data)
{
//Regex explanation
//(Start): matches the word Start
//(\S{4}): matches any four characters that are not whitespace
//( {4}): matches four spaces
//([\s\S]*): matches all remaining chacters in the string, including newlines
Regex rx = new Regex(@"(Start)(\S{4})( {4})([\s\S]*)");
//A successful match will have 5 capture groups
var match = rx.Match(input);
trailingFour = match.Success ? match.Groups[2].Value : string.Empty;
data = match.Success ? match.Groups[4].Value : string.Empty;
return match.Success;
}
答案3
得分: 0
使用正则表达式来解决这个问题,而不是字符串方法。构建一个模式来捕获这样的行:
- 以字面意义上的
Start
单词开头。 - 后跟任意四个字符
(.{4})
。分组以获取值。 - 后跟四个空格
\s{4}
。 - 后跟一些文本
(.*?)
。分组以获取值,并 - 以字面意义上的
End
单词结尾。
将它放在一起:
Start(.{4})\s{4}(.*?)End
示例
var input = "Start1234 …data…End Start3453 sdfsdfsdfsEnd\nStartSLDE some data.End";
var pattern = @"Start(.{4})\s{4}(.*?)End";
foreach (Match m in Regex.Matches(input, pattern))
Console.WriteLine($"{m.Value}, 1st Group: {m.Groups[1].Value}, 2nd Group: {m.Groups[2].Value}");
这将只返回两个匹配项:
Start3453 sdfsdfsdfsEnd, 1st Group: 3453, 2nd Group: sdfsdfsdfs
StartSLDE some data.End, 1st Group: SLDE, 2nd Group: some data.
如果需要用其他内容替换匹配项并返回一个新字符串,可以调用RegEx.Replace
方法:
var replaced = Regex.Replace(input, pattern, string.Empty);
英文:
Use RegEx instead of the string methods for this problem. Form a pattern to capture lines:
- Start with literally
Start
word. - Followed by any four characters
(.{4})
. Grouped to get the value. - Followed by four whitespaces
\s{4}
. - Followed by some text
(.*?)
. Grouped to get the value, and - End with literally
End
word.
Put it together:
Start(.{4})\s{4}(.*?)End
Example
var input = "Start1234 …data…End Start3453 sdfsdfsdfsEnd\nStartSLDE some data.End";
var pattern = @"Start(.{4})\s{4}(.*?)End";
foreach (Match m in Regex.Matches(input, pattern))
Console.WriteLine($"{m.Value}, 1st Group: {m.Groups[1].Value}, 2nd Group: {m.Groups[2].Value}");
This will return two matches only:
Start3453 sdfsdfsdfsEnd, 1st Group: 3453, 2nd Group: sdfsdfsdfs
StartSLDE some data.End, 1st Group: SLDE, 2nd Group: some data.
Call RegEx.Replace
method If you need to replace the matches with something else and return a new string:
var replaced = Regex.Replace(input, pattern, string.Empty);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论