如何在Powershell中使用第一行替换重命名文本文件时替换重音字符。

huangapple go评论59阅读模式
英文:

How to substitute accented characters when renaming text files using first line in Powershell

问题

I can help you with the translation:

"我正在尝试使用每个文件的第一行来批量重命名纯文本文件。我希望只保留字母和数字字符,几乎已经实现了。唯一的问题是,我需要保留带重音的字符,例如éá,以其不带重音的形式,比如e和a(文本是西班牙语),或者将它们保留在文件名中,而不删除。这是我目前正在使用的代码:

Get-ChildItem *.txt | Rename-Item -NewName {
    $firstLine = ($_ | Get-Content -TotalCount 1) -replace '[^a-z0-9 ]'
    '{0}.txt' -f $firstLine
}

谢谢。如果可能的话,请告诉我是否有办法保留符号“?”。"

英文:

I'm trying to batch rename plain text files using the first line of each file. I want to keep only alphanumeric characters in with your help I'm almost there. The only issue is that I need accented characters like é or á to be preserved in a form of their respective not accented characters: e and a (text is in Spanish) or be preserved in the name as they are, not removed. This is what I'm using right now:

Get-ChildItem *.txt | Rename-Item -NewName {
    $firstLine = ($_ | Get-Content -TotalCount 1) -replace '[^a-z0-9 ]'
    '{0}.txt' -f $firstLine
}

Thank you. If possible, please let me know if there is a way to keep the symbol "?" too.

答案1

得分: 2

以下是您要翻译的内容:

"Approach is similar to the one used in this answer, you can use the String.Normalize Method before your regex replacement.

As for not removing ?, you can simply add it to the character range: [^a-z0-9 ?], however it is an invalid character for file names in Windows, thus not used in the renaming code snippet for this answer. You can use [IO.Path]::GetInvalidFileNameChars() to get the list of invalid characters for your OS.

Get-ChildItem *.txt | Rename-Item -NewName {
    $firstLine = ($_ | Get-Content -TotalCount 1 -Encoding utf8).
        Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ]';

    '{0}.txt' -f $firstLine
}

Example:

$string = 'áÁéÉñÑ?'
$string.Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ?]';

# Outputs:
# aAeEnN?

Worth noting, default Get-Content encoding will be problematic in Windows PowerShell:

Default Uses the encoding that corresponds to the system's active code page (usually ANSI).

Thus the need for -Encoding utf8. Newer PowerShell versions don't have such problem as they default to utf8NoBOM."

英文:

Approach is similar to the one used in this answer, you can use the String.Normalize Method before your regex replacement.

As for not removing ?, you can simply add it to the character range: [^a-z0-9 ?], however it is an invalid character for file names in Windows, thus not used in the renaming code snippet for this answer. You can use [IO.Path]::GetInvalidFileNameChars() to get the list of invalid characters for your OS.

Get-ChildItem *.txt | Rename-Item -NewName {
    $firstLine = ($_ | Get-Content -TotalCount 1 -Encoding utf8).
        Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ]'

    '{0}.txt' -f $firstLine
}

Example:

$string = 'áÁéÉñÑ?'
$string.Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ?]'

# Outputs:
# aAeEnN?

Worth noting, default Get-Content encoding will be problematic in Windows PowerShell:

> Default Uses the encoding that corresponds to the system's active code page (usually ANSI).

Thus the need for -Encoding utf8. Newer PowerShell versions don't have such problem as they default to utf8NoBOM.

答案2

得分: 1

All you need to do is to add á and é to exclusion list of your replacement, ant they will be preserved:

Get-ChildItem *.txt | Rename-Item -NewName {
    ($_ | Get-Content -TotalCount 1 -Encoding UTF8) -replace '[^a-z0-9éá ]', '' -replace '.*', '$0.txt'
}

As for ? - it is not a valid symbol for a filename in Windows, so I don't see a point there. But you can always do multiple replacements and replace it with something allowed. Like so:

"asd we'wea?gke é or á? to b" -replace '[^a-z0-9éá ]', '' -replace '\?', '!!!!'
英文:

All you need to do is to add á and é to exclusion list of your replacement, ant they will be preserved:

Get-ChildItem *.txt | Rename-Item -NewName {
    ($_ | Get-Content -TotalCount 1 -Encoding UTF8) -replace '[^a-z0-9éá ]', '' -replace '.*', '$0.txt'
}

As for ? - it is not valid symbol for filename in windows, so I don't see a point there. But you always can do multiple replacement, and replace it with something allowed. Like so:

"asd we'wea?gke é or á? to b" -replace '[^a-z0-9éá ]', '' -replace '\?', '!!!!'

答案3

得分: 1

Santiago Squarzon的有用答案 展示了如何将重音字母(如é)转换为它们的非重音形式,比如e,从而使它们被a-z正则表达式范围表达式覆盖。

至于保留重音字符不变(你表示这也可以接受):

可以使用**\p{Ll}代替a-z,它匹配任何Unicode小写字母**,因此也包括重音字母(请参阅所有Unicode类别的列表)。由于-replace是不区分大小写的,因此大写字母也会被考虑在内:

Get-ChildItem *.txt | Rename-Item -NewName {
  $firstLine = 
    ($_ | Get-Content -TotalCount 1 -Encoding utf8) -replace '[^\p{Ll}0-9 ]'
  '{0}.txt' -f $firstLine
}

注意:我正在使用-Encoding utf8,就像其他答案中一样,用于读取您的文件,这仅在Windows PowerShell中才有必要,如果您的文件碰巧是UTF-8编码但没有BOM。

一个简化的示例:

# -> 'aÄ éE 42'; 即,所有字母和数字都被保留。
'a-Ä é/E 42' -replace '[^\p{Ll}0-9 ]'
英文:

Santiago Squarzon's helpful answer shows you how to transform accented letters - such as é - to their unaccented form, such as e, causing them to be covered by the a-z regex range expression.

As for preserving accented characters as-is (which you state is acceptable too):

In lieu of a-z you can use \p{Ll}, which matches any Unicode lowercase letter therefore also accented ones (see the list of all Unicode categories).
By virtue of -replace being case-insensitive, uppercase letters are implicitly considered as well:

Get-ChildItem *.txt | Rename-Item -NewName {
  $firstLine = 
    ($_ | Get-Content -TotalCount 1 -Encoding utf8) -replace '[^\p{Ll}0-9 ]'
  '{0}.txt' -f $firstLine
}

<sup>Note: I'm using -Encoding utf8, as in the other answers, to read your file, which is only necessary in Windows PowerShell if your file happens to be UTF-8-encoded but without a BOM.</sup>

A simplified example:

# -&gt; &#39;a&#196; &#233;E 42&#39;; that is, all letters and digits were preserved.
&#39;a-&#196; &#233;/E 42&#39; -replace &#39;[^\p{Ll}0-9 ]&#39;

huangapple
  • 本文由 发表于 2023年3月31日 03:51:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75892416.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定