英文:
How to substitute accented characters when renaming text files using first line in Powershell
问题
I can help you with the translation:
"我正在尝试使用每个文件的第一行来批量重命名纯文本文件。我希望只保留字母和数字字符,几乎已经实现了。唯一的问题是,我需要保留带重音的字符,例如é
或á
,以其不带重音的形式,比如e和a(文本是西班牙语),或者将它们保留在文件名中,而不删除。这是我目前正在使用的代码:
Get-ChildItem *.txt | Rename-Item -NewName {
$firstLine = ($_ | Get-Content -TotalCount 1) -replace '[^a-z0-9 ]'
'{0}.txt' -f $firstLine
}
谢谢。如果可能的话,请告诉我是否有办法保留符号“?”。"
英文:
I'm trying to batch rename plain text files using the first line of each file. I want to keep only alphanumeric characters in with your help I'm almost there. The only issue is that I need accented characters like é
or á
to be preserved in a form of their respective not accented characters: e and a (text is in Spanish) or be preserved in the name as they are, not removed. This is what I'm using right now:
Get-ChildItem *.txt | Rename-Item -NewName {
$firstLine = ($_ | Get-Content -TotalCount 1) -replace '[^a-z0-9 ]'
'{0}.txt' -f $firstLine
}
Thank you. If possible, please let me know if there is a way to keep the symbol "?" too.
答案1
得分: 2
以下是您要翻译的内容:
"Approach is similar to the one used in this answer, you can use the String.Normalize
Method before your regex replacement.
As for not removing ?
, you can simply add it to the character range: [^a-z0-9 ?]
, however it is an invalid character for file names in Windows, thus not used in the renaming code snippet for this answer. You can use [IO.Path]::GetInvalidFileNameChars()
to get the list of invalid characters for your OS.
Get-ChildItem *.txt | Rename-Item -NewName {
$firstLine = ($_ | Get-Content -TotalCount 1 -Encoding utf8).
Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ]';
'{0}.txt' -f $firstLine
}
Example:
$string = 'áÁéÉñÑ?'
$string.Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ?]';
# Outputs:
# aAeEnN?
Worth noting, default Get-Content
encoding will be problematic in Windows PowerShell:
Default
Uses the encoding that corresponds to the system's active code page (usually ANSI).
Thus the need for -Encoding utf8
. Newer PowerShell versions don't have such problem as they default to utf8NoBOM
."
英文:
Approach is similar to the one used in this answer, you can use the String.Normalize
Method before your regex replacement.
As for not removing ?
, you can simply add it to the character range: [^a-z0-9 ?]
, however it is an invalid character for file names in Windows, thus not used in the renaming code snippet for this answer. You can use [IO.Path]::GetInvalidFileNameChars()
to get the list of invalid characters for your OS.
Get-ChildItem *.txt | Rename-Item -NewName {
$firstLine = ($_ | Get-Content -TotalCount 1 -Encoding utf8).
Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ]'
'{0}.txt' -f $firstLine
}
Example:
$string = 'áÁéÉñÑ?'
$string.Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ?]'
# Outputs:
# aAeEnN?
Worth noting, default Get-Content
encoding will be problematic in Windows PowerShell:
> Default
Uses the encoding that corresponds to the system's active code page (usually ANSI).
Thus the need for -Encoding utf8
. Newer PowerShell versions don't have such problem as they default to utf8NoBOM
.
答案2
得分: 1
All you need to do is to add á
and é
to exclusion list of your replacement, ant they will be preserved:
Get-ChildItem *.txt | Rename-Item -NewName {
($_ | Get-Content -TotalCount 1 -Encoding UTF8) -replace '[^a-z0-9éá ]', '' -replace '.*', '$0.txt'
}
As for ?
- it is not a valid symbol for a filename in Windows, so I don't see a point there. But you can always do multiple replacements and replace it with something allowed. Like so:
"asd we'wea?gke é or á? to b" -replace '[^a-z0-9éá ]', '' -replace '\?', '!!!!'
英文:
All you need to do is to add á
and é
to exclusion list of your replacement, ant they will be preserved:
Get-ChildItem *.txt | Rename-Item -NewName {
($_ | Get-Content -TotalCount 1 -Encoding UTF8) -replace '[^a-z0-9éá ]', '' -replace '.*', '$0.txt'
}
As for ?
- it is not valid symbol for filename in windows, so I don't see a point there. But you always can do multiple replacement, and replace it with something allowed. Like so:
"asd we'wea?gke é or á? to b" -replace '[^a-z0-9éá ]', '' -replace '\?', '!!!!'
答案3
得分: 1
Santiago Squarzon的有用答案 展示了如何将重音字母(如é
)转换为它们的非重音形式,比如e
,从而使它们被a-z
正则表达式范围表达式覆盖。
至于保留重音字符不变(你表示这也可以接受):
可以使用**\p{Ll}
代替a-z
,它匹配任何Unicode小写字母**,因此也包括重音字母(请参阅所有Unicode类别的列表)。由于-replace
是不区分大小写的,因此大写字母也会被考虑在内:
Get-ChildItem *.txt | Rename-Item -NewName {
$firstLine =
($_ | Get-Content -TotalCount 1 -Encoding utf8) -replace '[^\p{Ll}0-9 ]'
'{0}.txt' -f $firstLine
}
注意:我正在使用-Encoding utf8
,就像其他答案中一样,用于读取您的文件,这仅在Windows PowerShell中才有必要,如果您的文件碰巧是UTF-8编码但没有BOM。
一个简化的示例:
# -> 'aÄ éE 42'; 即,所有字母和数字都被保留。
'a-Ä é/E 42' -replace '[^\p{Ll}0-9 ]'
英文:
Santiago Squarzon's helpful answer shows you how to transform accented letters - such as é
- to their unaccented form, such as e
, causing them to be covered by the a-z
regex range expression.
As for preserving accented characters as-is (which you state is acceptable too):
In lieu of a-z
you can use \p{Ll}
, which matches any Unicode lowercase letter therefore also accented ones (see the list of all Unicode categories).
By virtue of -replace
being case-insensitive, uppercase letters are implicitly considered as well:
Get-ChildItem *.txt | Rename-Item -NewName {
$firstLine =
($_ | Get-Content -TotalCount 1 -Encoding utf8) -replace '[^\p{Ll}0-9 ]'
'{0}.txt' -f $firstLine
}
<sup>Note: I'm using -Encoding utf8
, as in the other answers, to read your file, which is only necessary in Windows PowerShell if your file happens to be UTF-8-encoded but without a BOM.</sup>
A simplified example:
# -> 'aÄ éE 42'; that is, all letters and digits were preserved.
'a-Ä é/E 42' -replace '[^\p{Ll}0-9 ]'
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论