英文:
Extract Lines between two patterns using Powershell
问题
以下是已翻译的代码部分:
# 指定要从中提取行的文件路径
$file = "C:\Users\YOURNAME\Desktop\Load Summary.txt"
# 获取文件内容作为字符串数组
$content = Get-Content $file
# 查找第一次出现"LOAD SUMMARY"的行号
$startIndex = ($content | Select-String -Pattern "LOAD SUMMARY").LineNumber
# 查找第一次出现"*****END LOAD SESSION*****"的行号
$endIndex = ($content | Select-String -Pattern "\*\*\*\*\*END LOAD SESSION\*\*\*\*\*").LineNumber - 1
# 遍历开始和结束索引之间的行,以查找"============"行
for ($i = $startIndex; $i -le $endIndex; $i++) {
if ($content[$i] -eq "============") {
# 如果找到"============"行,则将开始索引设置为下一行
$startIndex = $i + 1
break
}
}
$startIndex = [int]$startIndex
$endIndex = [int]$endIndex
# 提取开始和结束索引之间的行
$extractedLines = $content[$startIndex..$endIndex]
# 输出提取的行
$extractedLines
请注意,已经翻译的内容仅包括代码部分,不包括问题的翻译和其他内容。
英文:
I have a text file which I need to extract a undefined number of rows. However, I have a patterns for both the start and end index.
- The Start Index is flagged by two consecutive rows. First being "LOAD SUMMARY" and the second being "============"
- The end index is flagged by the row that contains ***** END LOAD SESSION *****
I then need these indexes offet by 2 rows (upwards)/ Let say the TXT file contains this
maybe some more text
some text
SOme really important text LOAD SUMMARY. But I don't want to include this row.
023-03-06 13:57:55.719 <TASK_12836-WRITER_2_*_1> INFO: [WRT_8035] Load complete time: Mon Mar 06 13:57:55 2023
LOAD SUMMARY
============
WRT_8036 Target: MAPPING NAME (Instance Name: [MAPPING NAME])
WRT_8038 Inserted rows - Requested: 45147 Applied: 45147 Rejected: 0 Affected: 45147
2023-03-06 13:57:55.719 <TASK_12836-WRITER_2_*_1> INFO: [WRT_8043] *****END LOAD SESSION*****
SOme more test
LOAD SUMMARY > I don't want this row either
I want the $extractedlines variable to return this
023-03-06 13:57:55.719 <TASK_12836-WRITER_2_*_1> INFO: [WRT_8035] Load complete time: Mon Mar 06 13:57:55 2023
LOAD SUMMARY
============
WRT_8036 Target: MAPPING NAME (Instance Name: [MAPPING NAME])
WRT_8038 Inserted rows - Requested: 45147 Applied: 45147 Rejected: 0 Affected: 45147
This is my code so far
# Specify the path to the file you want to extract lines from
$file = "C:\Users\YOURNAME\Desktop\Load Summary.txt"
# Get the contents of the file as an array of strings
$content = Get-Content $file
# Find the line number of the first occurrence of "LOAD SUMMARY"
$startIndex = ($content | Select-String -Pattern "LOAD SUMMARY").LineNumber
# Find the line number of the first occurrence of "*****END LOAD SESSION*****"
$endIndex = ($content | Select-String -Pattern "\*\*\*\*\*END LOAD SESSION\*\*\*\*\*").LineNumber - 1
# Iterate through the lines between the start and end indices to find the "============" line
for ($i = $startIndex; $i -le $endIndex; $i++) {
if ($content[$i] -eq "============") {
# If the "============" line is found, set the start index to the next line
$startIndex = $i + 1
break
}
}
$startIndex = [int]$startIndex
$endIndex = [int]$endIndex
# Extract the lines between the start and end indices
$extractedLines = $content[$startIndex..$endIndex]
# Output the extracted lines
$extractedLines
答案1
得分: 1
以下是翻译的代码部分:
# 定义与匹配起始索引前两行的偏移量
$offset = 2
# 这些是用于标志开始捕获的两行
$start = @(
'加载摘要'
'============'
)
# 这一行用于标志终止循环
$finish = '*****结束加载会话*****'
# 读取文件内容
$content = Get-Content "C:\Users\YOURNAME\Desktop\Load Summary.txt"
$startCapture = $false
$extractedLines = for($i = 0; $i -lt $content.Length; $i++) {
# 如果这一行匹配`$start`数组的索引0,
# 并且下一行匹配索引1
if($content[$i] -like "*$($start[0])*" -and $content[$i + 1] -like "*$($start[1])*") {
# 那么将此布尔值设置为true,以标志开始捕获以下行
$startCapture = $true
# 从此行开始,减去偏移量(在此情况下为2行之前)
# 到此行和下一行
$content[($i - $offset)..($i + 1)]
# 增加`$i`,以便我们不检查下一行(已经在此处检查了!)
$i++
# 并继续下一个循环迭代,
# 我们不希望在这种情况下评估下面的条件
continue
}
# 如果此行匹配用于停止捕获的字符串
if($content[$i] -like "*$finish*") {
# 终止循环
break
}
# 如果第一个条件为真,则
if($startCapture) {
# 原样输出此行
$content[$i]
}
}
# 这是结果
$extractedLines
英文:
This might help you get started, since you're looking to match 2 lines above the start index then the easiest way is to hold the content of the file in memory before processing it. See the inline comments to help you with the logic.
# define how many lines before matching `$start`
$offset = 2
# these are the 2 lines that signal to start capturing
$start = @(
'LOAD SUMMARY'
'============'
)
# this one signals to break the loop
$finish = '*****END LOAD SESSION*****'
# read the file content
$content = Get-Content "C:\Users\YOURNAME\Desktop\Load Summary.txt"
$startCapture = $false
$extractedLines = for($i = 0; $i -lt $content.Length; $i++) {
# if this line matches the index 0 of the `$start` array and
# the next line matches the index 1
if($content[$i] -like "*$($start[0])*" -and $content[$i + 1] -like "*$($start[1])*") {
# then, set this boolean to signal start capturing the following lines
$startCapture = $true
# output from this line minus the offset (2 lines above in this case)
# up to this line and next line
$content[($i - $offset)..($i + 1)]
# increment `$i` so we don't check the next line (already did here!)
$i++
# and continue with the next loop iteration,
# we don't want below conditions to be evaluated in this case
continue
}
# if this line matches the string singaling to stop capture
if($content[$i] -like "*$finish*") {
# break the loop
break
}
# if the first condition was true then
if($startCapture) {
# output this line as-is
$content[$i]
}
}
# here is the result
$extractedLines
答案2
得分: 1
(?s)
:单行模式(DOTALL)([^\n\r]*\r?\n){2}
:(非换行字符*换行符) × 2LOAD SUMMARY\r?\n={12}
:文字两行匹配(\r?\n[^\n\r]*)*?
:零个或多个(惰性)(换行符*零个或多个非换行字符)\r?\n
:最终换行(?=[^\n\r]*\*{5}END LOAD SESSION\*{5})
:正向预查(不捕获)
英文:
If all you really need are the selected lines, you can capture them with a single regex:
$file = 'C:\Users\YOURNAME\Desktop\Load Summary.txt'
$pattern = '(?s)([^\n\r]*\r?\n){2}LOAD SUMMARY\r?\n={12}(\r?\n[^\n\r]*)*?\r?\n(?=[^\n\r]*\*{5}END LOAD SESSION\*{5})'
$extractedLines = [Regex]::Match( (Get-Content $file -Raw) , $pattern ).Value
$extractedLines
(?s)
: Single-line mode (DOTALL)([^\n\r]*\r?\n){2}
: (<Zero or more non-linebreak characters><NewLine>) × 2LOAD SUMMARY\r?\n={12}
: Literal two-line match(\r?\n[^\n\r]*)*?
: Zero or more (lazy) (<NewLine><Zero or more non-linebreak characters>)\r?\n
: Final newline(?=[^\n\r]*\*{5}END LOAD SESSION\*{5})
: Positive lookahead (not captured)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论