获取多行文本中两个字符串之间的内容

huangapple go评论68阅读模式
英文:

Powershell - get content between 2 strings in multiple lines

问题

Desired output:

<media src="C:\Users\user\Downloads\Katzianerova vojna.mp3"/>
<media src="C:\Users\user\Downloads\Rat i mir u povijesti III- dio.mp3"/>

Actual output:

<media src="C:\Users\user\Downloads\Katzianerova vojna.mp3"/>             <media src="C:\Users\user\Downloads\Rat i mir u povijesti III- dio.m
p3"/>
英文:

I have file 1.wpl:

<?wpl version="1.0"?>
<smil>
    <head>
        <meta name="Generator" content="Microsoft Windows Media Player -- 12.0.22621.1"/>
        <meta name="ItemCount" content="2"/>
        <title>Untitled playlist</title>
    </head>
    <body>
        <seq>
            <media src="C:\Users\user\Downloads\Katzianerova vojna.mp3"/>
            <media src="C:\Users\user\Downloads\Rat i mir u povijesti III- dio.mp3"/>
        </seq>
    </body>
</smil>

I want to get content between <seq> and </seq> in multiple lines:

Desired output:

<media src="C:\Users\user\Downloads\Katzianerova vojna.mp3"/>
<media src="C:\Users\user\Downloads\Rat i mir u povijesti III- dio.mp3"/>

Have this code which gives me output in single line:

$fileName = "C:\Users\user\Music\Playlists.wpl"
 #Get content from file
$file = Get-Content $fileName
   
#Regex pattern to compare two strings
$pattern = "<seq>(.*?)</seq>"

#Perform the opperation
$results = [regex]::Match($file,$pattern).Groups[1].Value -split [System.Environment]::NewLine

return $results

Actual output:

 <media src="C:\Users\user\Downloads\Katzianerova vojna.mp3"/>             <media src="C:\Users\user\Downloads\Rat i mir u povijesti III- dio.m
p3"/>

答案1

得分: 2

没有使用正则表达式的理由,因为您的内容是有效的XML:

($xml = [xml]::new()).Load('C:\Users\user\Music\Playlists.wpl')
$xml.SelectNodes('smil/body/seq/media') | ForEach-Object OuterXml

# 输出:
# <media src="C:\Users\user\Downloads\Katzianerova vojna.mp3" />
# <media src="C:\Users\user\Downloads\Rat i mir u povijesti III- dio.mp3" />

或者在XPath中使用通配符:

($xml = [xml]::new()).Load('C:\Users\user\Music\Playlists.wpl')
$xml.SelectNodes("//seq/*") | ForEach-Object OuterXml
英文:

There is literally no reason to use regex when what you have is valid XML:

($xml = [xml]::new()).Load(&#39;C:\Users\user\Music\Playlists.wpl&#39;)
$xml.SelectNodes(&#39;smil/body/seq/media&#39;) | ForEach-Object OuterXml

# Outputs:
# &lt;media src=&quot;C:\Users\user\Downloads\Katzianerova vojna.mp3&quot; /&gt;
# &lt;media src=&quot;C:\Users\user\Downloads\Rat i mir u povijesti III- dio.mp3&quot; /&gt;

Or using wildcard in the XPath:

($xml = [xml]::new()).Load(&#39;C:\Users\user\Music\Playlists.wpl&#39;)
$xml.SelectNodes(&quot;//seq/*&quot;) | ForEach-Object OuterXml

答案2

得分: 1

你可以尝试使用一个 "switch"

Get-Content -Path "C:\Users\user\Music\Playlists.wpl" |
    ForEach-Object {
        switch -Regex ($_) {
            '\&lt;seq\&gt;$' {
                $break = 1
            }
            '\&lt;\/seq\&gt;$' {
                $break = 0
            }
            default {
                if ($break -eq 1) {
                    $_ -replace '^\s+'
                }
            }
        }
    }
英文:

You could try using a "switch"

Get-Content -Path &quot;C:\Users\user\Music\Playlists.wpl&quot; |
    ForEach-Object {
        switch -Regex ($_) {
            &#39;\&lt;seq\&gt;$&#39; {
                $break = 1
            }
            &#39;\&lt;\/seq\&gt;$&#39; {
                $break = 0
            }
            default {
                if ($break -eq 1) {
                    $_ -replace &#39;^\s+&#39;
                }
            }
        }
    }

答案3

得分: 0

找到解决方案:

$fileContent = Get-Content -Path "C:\Users\user\Music\Playlists.wpl" -Raw
$regexPattern = "(?s)<seq>(.*?)</seq>"
$matches = [regex]::Match($fileContent, $regexPattern)

if ($matches.Success) {
    $seqContent = $matches.Groups[1].Value
    $lines = $seqContent -split "`n"
    $output = ($lines | Where-Object { $_.Trim() -ne '' }) -join "`n"
    Write-Output $output
} else {
    Write-Output "在文件中未找到 <seq> 内容。"
}
英文:

found solution:

$fileContent = Get-Content -Path &quot;C:\Users\user\Music\Playlists.wpl&quot; -Raw
$regexPattern = &quot;(?s)&lt;seq&gt;(.*?)&lt;/seq&gt;&quot;
$matches = [regex]::Match($fileContent, $regexPattern)

if ($matches.Success) {
    $seqContent = $matches.Groups[1].Value
    $lines = $seqContent -split &quot;`n&quot;
    $output = ($lines | Where-Object { $_.Trim() -ne &#39;&#39; }) -join &quot;`n&quot;
    Write-Output $output
} else {
    Write-Output &quot;No &lt;seq&gt; content found in the file.&quot;
}

huangapple
  • 本文由 发表于 2023年6月16日 01:59:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76484360.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定