使用Powershell提取两个模式之间的行。

huangapple go评论62阅读模式
英文:

Extract Lines between two patterns using Powershell

问题

以下是已翻译的代码部分:

# 指定要从中提取行的文件路径
$file = "C:\Users\YOURNAME\Desktop\Load Summary.txt"

# 获取文件内容作为字符串数组
$content = Get-Content $file

# 查找第一次出现"LOAD SUMMARY"的行号
$startIndex = ($content | Select-String -Pattern "LOAD SUMMARY").LineNumber

# 查找第一次出现"*****END LOAD SESSION*****"的行号
$endIndex = ($content | Select-String -Pattern "\*\*\*\*\*END LOAD SESSION\*\*\*\*\*").LineNumber - 1

# 遍历开始和结束索引之间的行,以查找"============"行
for ($i = $startIndex; $i -le $endIndex; $i++) {
    if ($content[$i] -eq "============") {
        # 如果找到"============"行,则将开始索引设置为下一行
        $startIndex = $i + 1
        break
    }
}
$startIndex = [int]$startIndex
$endIndex = [int]$endIndex

# 提取开始和结束索引之间的行
$extractedLines = $content[$startIndex..$endIndex]

# 输出提取的行
$extractedLines

请注意,已经翻译的内容仅包括代码部分,不包括问题的翻译和其他内容。

英文:

I have a text file which I need to extract a undefined number of rows. However, I have a patterns for both the start and end index.

  • The Start Index is flagged by two consecutive rows. First being "LOAD SUMMARY" and the second being "============"
  • The end index is flagged by the row that contains ***** END LOAD SESSION *****

I then need these indexes offet by 2 rows (upwards)/ Let say the TXT file contains this

maybe some more text

some text

SOme really important text LOAD SUMMARY. But I don't want to include this row. 

023-03-06 13:57:55.719 <TASK_12836-WRITER_2_*_1> INFO: [WRT_8035] Load complete time: Mon Mar 06 13:57:55 2023

LOAD SUMMARY
============


WRT_8036 Target: MAPPING NAME (Instance Name: [MAPPING NAME])

WRT_8038 Inserted rows - Requested: 45147                Applied: 45147                Rejected: 0                    Affected: 45147              

2023-03-06 13:57:55.719 <TASK_12836-WRITER_2_*_1> INFO: [WRT_8043] *****END LOAD SESSION*****

SOme more test 

LOAD SUMMARY > I don't want this row either

I want the $extractedlines variable to return this

023-03-06 13:57:55.719 <TASK_12836-WRITER_2_*_1> INFO: [WRT_8035] Load complete time: Mon Mar 06 13:57:55 2023

LOAD SUMMARY
============


WRT_8036 Target: MAPPING NAME (Instance Name: [MAPPING NAME])

WRT_8038 Inserted rows - Requested: 45147                Applied: 45147                Rejected: 0                    Affected: 45147              

This is my code so far

# Specify the path to the file you want to extract lines from
$file = "C:\Users\YOURNAME\Desktop\Load Summary.txt"

# Get the contents of the file as an array of strings
$content = Get-Content $file

# Find the line number of the first occurrence of "LOAD SUMMARY"
$startIndex = ($content | Select-String -Pattern "LOAD SUMMARY").LineNumber

# Find the line number of the first occurrence of "*****END LOAD SESSION*****"
$endIndex = ($content | Select-String -Pattern "\*\*\*\*\*END LOAD SESSION\*\*\*\*\*").LineNumber - 1

# Iterate through the lines between the start and end indices to find the "============" line
for ($i = $startIndex; $i -le $endIndex; $i++) {
    if ($content[$i] -eq "============") {
        # If the "============" line is found, set the start index to the next line
        $startIndex = $i + 1
        break
    }
}
$startIndex = [int]$startIndex
$endIndex = [int]$endIndex

# Extract the lines between the start and end indices
$extractedLines = $content[$startIndex..$endIndex]

# Output the extracted lines
$extractedLines

答案1

得分: 1

以下是翻译的代码部分:

# 定义与匹配起始索引前两行的偏移量
$offset = 2
# 这些是用于标志开始捕获的两行
$start = @(
    '加载摘要'
    '============'
)

# 这一行用于标志终止循环
$finish = '*****结束加载会话*****'

# 读取文件内容
$content = Get-Content "C:\Users\YOURNAME\Desktop\Load Summary.txt"

$startCapture = $false
$extractedLines = for($i = 0; $i -lt $content.Length; $i++) {
    # 如果这一行匹配`$start`数组的索引0,
    # 并且下一行匹配索引1
    if($content[$i] -like "*$($start[0])*" -and $content[$i + 1] -like "*$($start[1])*") {
        # 那么将此布尔值设置为true,以标志开始捕获以下行
        $startCapture = $true
        # 从此行开始,减去偏移量(在此情况下为2行之前)
        # 到此行和下一行
        $content[($i - $offset)..($i + 1)]
        # 增加`$i`,以便我们不检查下一行(已经在此处检查了!)
        $i++
        # 并继续下一个循环迭代,
        # 我们不希望在这种情况下评估下面的条件
        continue
    }

    # 如果此行匹配用于停止捕获的字符串
    if($content[$i] -like "*$finish*") {
        # 终止循环
        break
    }

    # 如果第一个条件为真,则
    if($startCapture) {
        # 原样输出此行
        $content[$i]
    }
}

# 这是结果
$extractedLines
英文:

This might help you get started, since you're looking to match 2 lines above the start index then the easiest way is to hold the content of the file in memory before processing it. See the inline comments to help you with the logic.

# define how many lines before matching `$start`
$offset = 2
# these are the 2 lines that signal to start capturing
$start = @(
    'LOAD SUMMARY'
    '============'
)

# this one signals to break the loop
$finish = '*****END LOAD SESSION*****'

# read the file content
$content = Get-Content "C:\Users\YOURNAME\Desktop\Load Summary.txt"

$startCapture = $false
$extractedLines = for($i = 0; $i -lt $content.Length; $i++) {
    # if this line matches the index 0 of the `$start` array and
    # the next line matches the index 1
    if($content[$i] -like "*$($start[0])*" -and $content[$i + 1] -like "*$($start[1])*") {
        # then, set this boolean to signal start capturing the following lines
        $startCapture = $true
        # output from this line minus the offset (2 lines above in this case)
        # up to this line and next line
        $content[($i - $offset)..($i + 1)]
        # increment `$i` so we don't check the next line (already did here!)
        $i++
        # and continue with the next loop iteration,
        # we don't want below conditions to be evaluated in this case
        continue
    }

    # if this line matches the string singaling to stop capture
    if($content[$i] -like "*$finish*") {
        # break the loop
        break
    }

    # if the first condition was true then
    if($startCapture) {
        # output this line as-is
        $content[$i]
    }
}

# here is the result
$extractedLines

答案2

得分: 1

  • (?s):单行模式(DOTALL)
  • ([^\n\r]*\r?\n){2}:(非换行字符*换行符) × 2
  • LOAD SUMMARY\r?\n={12}:文字两行匹配
  • (\r?\n[^\n\r]*)*?:零个或多个(惰性)(换行符*零个或多个非换行字符)
  • \r?\n:最终换行
  • (?=[^\n\r]*\*{5}END LOAD SESSION\*{5}):正向预查(不捕获)
英文:

If all you really need are the selected lines, you can capture them with a single regex:

$file = 'C:\Users\YOURNAME\Desktop\Load Summary.txt'

$pattern = '(?s)([^\n\r]*\r?\n){2}LOAD SUMMARY\r?\n={12}(\r?\n[^\n\r]*)*?\r?\n(?=[^\n\r]*\*{5}END LOAD SESSION\*{5})'

$extractedLines = [Regex]::Match( (Get-Content $file -Raw) , $pattern ).Value

$extractedLines
  • (?s) : Single-line mode (DOTALL)
  • ([^\n\r]*\r?\n){2} : (&lt;Zero or more non-linebreak characters><NewLine&gt;) × 2
  • LOAD SUMMARY\r?\n={12} : Literal two-line match
  • (\r?\n[^\n\r]*)*? : Zero or more (lazy) (&lt;NewLine>&lt;Zero or more non-linebreak characters>)
  • \r?\n : Final newline
  • (?=[^\n\r]*\*{5}END LOAD SESSION\*{5}) : Positive lookahead (not captured)

huangapple
  • 本文由 发表于 2023年3月7日 04:59:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75655801.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定