2023年3月31日 03:51:24go评论77阅读模式

英文:

How to substitute accented characters when renaming text files using first line in Powershell

问题

I can help you with the translation:

"我正在尝试使用每个文件的第一行来批量重命名纯文本文件。我希望只保留字母和数字字符，几乎已经实现了。唯一的问题是，我需要保留带重音的字符，例如é或á，以其不带重音的形式，比如e和a（文本是西班牙语），或者将它们保留在文件名中，而不删除。这是我目前正在使用的代码：

Get-ChildItem *.txt | Rename-Item -NewName {
    $firstLine = ($_ | Get-Content -TotalCount 1) -replace '[^a-z0-9 ]'
    '{0}.txt' -f $firstLine
}

谢谢。如果可能的话，请告诉我是否有办法保留符号“？”。"

英文:

I'm trying to batch rename plain text files using the first line of each file. I want to keep only alphanumeric characters in with your help I'm almost there. The only issue is that I need accented characters like é or á to be preserved in a form of their respective not accented characters: e and a (text is in Spanish) or be preserved in the name as they are, not removed. This is what I'm using right now:

Get-ChildItem *.txt | Rename-Item -NewName {
    $firstLine = ($_ | Get-Content -TotalCount 1) -replace &#39;[^a-z0-9 ]&#39;
    &#39;{0}.txt&#39; -f $firstLine
}

Thank you. If possible, please let me know if there is a way to keep the symbol "?" too.

答案1

得分: 2

以下是您要翻译的内容：

"Approach is similar to the one used in this answer, you can use the String.Normalize Method before your regex replacement.

As for not removing ?, you can simply add it to the character range: [^a-z0-9 ?], however it is an invalid character for file names in Windows, thus not used in the renaming code snippet for this answer. You can use [IO.Path]::GetInvalidFileNameChars() to get the list of invalid characters for your OS.

Get-ChildItem *.txt | Rename-Item -NewName {
    $firstLine = ($_ | Get-Content -TotalCount 1 -Encoding utf8).
        Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ]';

    '{0}.txt' -f $firstLine
}

Example:

$string = 'áÁéÉñÑ?'
$string.Normalize([Text.NormalizationForm]::FormD) -replace '[^a-z0-9 ?]';

# Outputs:
# aAeEnN?

Worth noting, default Get-Content encoding will be problematic in Windows PowerShell:

Default Uses the encoding that corresponds to the system's active code page (usually ANSI).

Thus the need for -Encoding utf8. Newer PowerShell versions don't have such problem as they default to utf8NoBOM."

英文:

Approach is similar to the one used in this answer, you can use the String.Normalize Method before your regex replacement.

Get-ChildItem *.txt | Rename-Item -NewName {
    $firstLine = ($_ | Get-Content -TotalCount 1 -Encoding utf8).
        Normalize([Text.NormalizationForm]::FormD) -replace &#39;[^a-z0-9 ]&#39;

    &#39;{0}.txt&#39; -f $firstLine
}

Example:

$string = &#39;&#225;&#193;&#233;&#201;&#241;&#209;?&#39;
$string.Normalize([Text.NormalizationForm]::FormD) -replace &#39;[^a-z0-9 ?]&#39;

# Outputs:
# aAeEnN?

Worth noting, default Get-Content encoding will be problematic in Windows PowerShell:

> Default Uses the encoding that corresponds to the system's active code page (usually ANSI).

Thus the need for -Encoding utf8. Newer PowerShell versions don't have such problem as they default to utf8NoBOM.

答案2

得分: 1

All you need to do is to add á and é to exclusion list of your replacement, ant they will be preserved:

Get-ChildItem *.txt | Rename-Item -NewName {
    ($_ | Get-Content -TotalCount 1 -Encoding UTF8) -replace '[^a-z0-9&#233;&#225; ]', '' -replace '.*', '$0.txt'
}

As for ? - it is not a valid symbol for a filename in Windows, so I don't see a point there. But you can always do multiple replacements and replace it with something allowed. Like so:

"asd we'wea?gke &#233; or &#225;? to b" -replace '[^a-z0-9&#233;&#225; ]', '' -replace '\?', '!!!!'

英文:

All you need to do is to add á and é to exclusion list of your replacement, ant they will be preserved:

Get-ChildItem *.txt | Rename-Item -NewName {
    ($_ | Get-Content -TotalCount 1 -Encoding UTF8) -replace &#39;[^a-z0-9&#233;&#225; ]&#39;, &#39;&#39; -replace &#39;.*&#39;, &#39;$0.txt&#39;
}

As for ? - it is not valid symbol for filename in windows, so I don't see a point there. But you always can do multiple replacement, and replace it with something allowed. Like so:

&quot;asd we&#39;wea?gke &#233; or &#225;? to b&quot; -replace &#39;[^a-z0-9&#233;&#225; ]&#39;, &#39;&#39; -replace &#39;\?&#39;, &#39;!!!!&#39;

答案3

得分: 1

Santiago Squarzon的有用答案展示了如何将重音字母（如é）转换为它们的非重音形式，比如e，从而使它们被a-z正则表达式范围表达式覆盖。

至于保留重音字符不变（你表示这也可以接受）：

可以使用**\p{Ll}代替a-z，它匹配任何Unicode小写字母**，因此也包括重音字母（请参阅所有Unicode类别的列表）。由于-replace是不区分大小写的，因此大写字母也会被考虑在内：

Get-ChildItem *.txt | Rename-Item -NewName {
  $firstLine = 
    ($_ | Get-Content -TotalCount 1 -Encoding utf8) -replace '[^\p{Ll}0-9 ]'
  '{0}.txt' -f $firstLine
}

^{注意：我正在使用-Encoding utf8，就像其他答案中一样，用于读取您的文件，这仅在Windows PowerShell中才有必要，如果您的文件碰巧是UTF-8编码但没有BOM。}

一个简化的示例：

# -> 'aÄ éE 42'; 即，所有字母和数字都被保留。
'a-Ä é/E 42' -replace '[^\p{Ll}0-9 ]'

英文:

Santiago Squarzon's helpful answer shows you how to transform accented letters - such as é - to their unaccented form, such as e, causing them to be covered by the a-z regex range expression.

As for preserving accented characters as-is (which you state is acceptable too):

In lieu of a-z you can use \p{Ll}, which matches any Unicode lowercase letter therefore also accented ones (see the list of all Unicode categories).
By virtue of -replace being case-insensitive, uppercase letters are implicitly considered as well:

Get-ChildItem *.txt | Rename-Item -NewName {
  $firstLine = 
    ($_ | Get-Content -TotalCount 1 -Encoding utf8) -replace &#39;[^\p{Ll}0-9 ]&#39;
  &#39;{0}.txt&#39; -f $firstLine
}

<sup>Note: I'm using -Encoding utf8, as in the other answers, to read your file, which is only necessary in Windows PowerShell if your file happens to be UTF-8-encoded but without a BOM.</sup>

A simplified example:

# -&gt; &#39;a&#196; &#233;E 42&#39;; that is, all letters and digits were preserved.
&#39;a-&#196; &#233;/E 42&#39; -replace &#39;[^\p{Ll}0-9 ]&#39;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Powershell中使用第一行替换重命名文本文件时替换重音字符。

问题

答案1

答案2

答案3

Powershell脚本中的非ASCII字符

自动输入 “yes” 在 Azure cmdlet 中。

使用正则表达式将文件按顺序重命名，保持“连接”的文件名模式。

Powershell 5.1 – 提示输入凭据

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论