合并多个具有相同标题的CSV文件,使用PowerShell。

huangapple go评论57阅读模式
英文:

Merge multiple csv file with same header using powershell

问题

我有多个包含以下数据的文件在一个文件夹中:

文件1

"Index","Response","Status","Time"
"32190","2","Succeeded","2023-01-18 08:31:34.9"
"32189","3","Succeeded","2023-01-18 08:26:34.9"
"32188","3","Succeeded","2023-01-18 08:21:34.9"

文件2

"Index","Response","Status","Time"
"32190","2","Succeeded","2023-01-19 08:31:34.9"
"32189","3","Succeeded","2023-01-19 08:26:34.9"
"32188","3","Succeeded","2023-01-19 08:21:34.9"

需要将这些文件合并为一个CSV文件,并且只有一个标题行:

"Index","Response","Status","Time"
"32190","2","Succeeded","2023-01-18 08:31:34.9"
"32189","3","Succeeded","2023-01-18 08:26:34.9"
"32188","3","Succeeded","2023-01-18 08:21:34.9"
"32190","2","Succeeded","2023-01-19 08:31:34.9"
"32189","3","Succeeded","2023-01-19 08:26:34.9"
"32188","3","Succeeded","2023-01-19 08:21:34.9"

我有以下代码,但我无法得到单一的标题行:

$folder = 'D:\reports\daily_csv' 
$files = Get-ChildItem $folder\*.csv 
Get-Content $files | Set-Content "D:\Monthly\Merged_$prev_month.csv"

请告诉我需要在这里添加什么以避免多个标题行。

英文:

I am having multiple csv files in a folder with data like below

file1

"Index","Response","Status","Time"
"32190","2","Succeeded","2023-01-18 08:31:34.9"
"32189","3","Succeeded","2023-01-18 08:26:34.9"
"32188","3","Succeeded","2023-01-18 08:21:34.9"

file2

"Index","Response","Status","Time"
"32190","2","Succeeded","2023-01-19 08:31:34.9"
"32189","3","Succeeded","2023-01-19 08:26:34.9"
"32188","3","Succeeded","2023-01-19 08:21:34.9"

Need to merge these files into a single csv with a one header

"Index","Response","Status","Time"
"32190","2","Succeeded","2023-01-18 08:31:34.9"
"32189","3","Succeeded","2023-01-18 08:26:34.9"
"32188","3","Succeeded","2023-01-18 08:21:34.9"
"32190","2","Succeeded","2023-01-19 08:31:34.9"
"32189","3","Succeeded","2023-01-19 08:26:34.9"
"32188","3","Succeeded","2023-01-19 08:21:34.9"

I have this below code but I am not able to get single header in it

$folder = 'D:\reports\daily_csv' 
$files = Get-ChildItem $folder\*.csv 
Get-Content $files | Set-Content "D:\Monthly\Merged_$prev_month.csv"

Please let me know what I need to add here to avoid multiple headers

答案1

得分: 3

以下是使用 StreamReader 和匿名函数执行的一种方法。请注意,.OpenText() 使用UTF8编码初始化StreamReader,如果这是一个问题,您可以改用 StreamReader(String, Encoding)

$folder = 'D:\reports\daily_csv'
Get-ChildItem $folder\*.csv | & {
    begin { $isFirstObject = $true }
    process {
        try {
            $reader  = $_.OpenText()
            $headers = $reader.ReadLine()

            if($isFirstObject) {
                $headers
                $isFirstObject = $false
            }

            while(-not $reader.EndOfStream) {
                $reader.ReadLine()
            }
        }
        finally {
            if($reader) {
                $reader.Dispose()
            }
        }
    }
} | Set-Content path\to\mergedCsv.csv
英文:

Here is one way to do it using StreamReader and an anonymous function. Note that .OpenText() initializes the StreamReader with UTF8 encoding, if that's a problem you can use StreamReader(String, Encoding) instead.

$folder = 'D:\reports\daily_csv'
Get-ChildItem $folder\*.csv | & {
    begin { $isFirstObject = $true }
    process {
        try {
            $reader  = $_.OpenText()
            $headers = $reader.ReadLine()

            if($isFirstObject) {
                $headers
                $isFirstObject = $false
            }

            while(-not $reader.EndOfStream) {
                $reader.ReadLine()
            }
        }
        finally {
            if($reader) {
                $reader.Dispose()
            }
        }
    }
} | Set-Content path\to\mergedCsv.csv

答案2

得分: 3

Santiago Squarzon的有益的纯文本处理答案绝对是您的最佳选择,无论是在性能上,还是在保留格式化细节方面(无论是所有字段还是只有一些字段是双引号括起来的)。

一个较慢但更方便的替代方法,它不保留格式化细节(但这不应该重要),是使用Import-Csv的支持多个输入文件,通过其-LiteralPath参数:

Import-Csv -LiteralPath (Get-ChildItem D:\reports\daily_csv -Filter *.csv).FullName |
  Export-Csv -NoTypeInformation -Encoding utf8 "D:\Monthly\Merged_$prev_month.csv"

请注意,在PowerShell(核心)7+中,Export-Csv不再需要-NoTypeInformation-Encoding utf8,除非您需要不同的编码(无BOM的UTF-8现在是一致的默认值;如果需要BOM,请使用-Encoding utf8bom)。

还请注意,PowerShell(核心)7+中已经修复了一个错误,允许将Get-ChildItem的结果通过管道提供给Import-Csv

# 仅限PS 7+ - WinPS中的错误阻止了将Get-ChildItem的输入提供给Import-Csv
Get-ChildItem D:\reports\daily_csv -Filter *.csv |
  Import-Csv |
  Export-Csv "D:\Monthly\Merged_$prev_month.csv"
英文:

<!-- language-all: sh -->

Santiago Squarzon's helpful plain-text-processing answer is definitely your best option, both in terms of performance, and in that it also preserves the formatting specifics (whether all fields or even only some fields are double-quoted or not).

A slower, but more convenient alternative that doesn't preserve the formatting specifics (which should not matter, however) is to use Import-Csv's support for multiple input files, via its -LiteralPath parameter:

Import-Csv -LiteralPath (Get-ChildItem D:\reports\daily_csv -Filter *.csv).FullName |
  Export-Csv -NoTypeInformation -Encoding utf8 &quot;D:\Monthly\Merged_$prev_month.csv&quot;

Note that neither -NoTypeInformation nor -Encoding utf8 are required anymore for Export-Csv in PowerShell (Core) 7+, unless you need a different encoding (BOM-less UTF-8 is now the consistent default; if you do need a BOM, use -Encoding utf8bom).

Also note that a bug has been fixed in PowerShell (Core) 7+ that enables providing the Get-ChildItem results to Import-Csv via the pipeline:

# PS 7+ ONLY - a bug in WinPS prevents Get-ChildItem input to Import-Csv
Get-ChildItem D:\reports\daily_csv -Filter *.csv |
  Import-Csv |
  Export-Csv &quot;D:\Monthly\Merged_$prev_month.csv&quot;

huangapple
  • 本文由 发表于 2023年2月8日 21:35:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75386584.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定