英文:
Read csv file having null values in some fields and also doesn't have the exact the same format values?
问题
我有一个CSV文件(ABC.CSV),其数据格式如下:
COLUMN1 COLUMN2 COLUMN3 COLUMN4 COLUMN5
12345 ABC RR,MM K NAO,KUM DEV
34567 CDEF NN INT
89567 KGH PP, BHIM PRKC PROD
9876 PIM DEV
6543 KCDEF NICE,MAN K INT
5432 GHK SIN,NICE C PROD
以下是我正在使用的PowerShell代码来读取这个CSV文件:
$filePath = "C:\path\to\your\abc.csv"
$searchString = "9876"
# 读取文件的内容
$content = Get-Content -Path $filePath
# 处理文件的每一行
foreach ($line in $content) {
# 将行拆分为单个值
$values = $line -split ','
# 提取值
$column1 = $values[0].Trim()
$column2 = $values[1].Trim()
$column3 = ($values[2].Trim(), $values[3].Trim()) -join ','
$column4 = $values[4].Trim()
$column5 = $values[5].Trim()
if ($column1 -like "9876" -and $column5 -like "PROD") {
Write-Host $column1
Write-Host $column2
Write-Host $column3
Write-Host $column4
Write-Host $column5
}
}
这段代码只有在所有字段都正确时才能正常工作,但如果任何字段为空或COLUMN 4和5的值只是CC而不是CC,KK,它会报错。
对于这个值是可以正常工作的:
12345 ABC RR,MM K NAO,KUM DEV
但对于这种情况没有正确的结果:
34567 CDEF NN INT
89567 KGH PP, BHIM PRKC PROD
9876 PIM DEV
6543 KCDEF NICE,MAN K INT
5432 GHK SIN,NICE C PROD
英文:
I have csv file (ABC.CSV) which is having data in below format
COLUMN1 COLUMN2 COLUMN3 COLUMN4 COLUMN5
12345 ABC RR,MM K NAO,KUM DEV
34567 CDEF NN INT
89567 KGH PP, BHIM PRKC PROD
9876 PIM DEV
6543 KCDEF NICE,MAN K INT
5432 GHK SIN,NICE C PROD
Below is the powershell code which I am using to read this csv file
$filePath = "C:\path\to\your\abc.csv"
$searchString = "9876"
# Read the content of the file
$content = Get-Content -Path $filePath
# Process each line of the file
foreach ($line in $content) {
# Split the line into individual values
$values = $line -split ','
# Extract the values
$column1 = $values[0].Trim()
$column2 = $values[1].Trim()
$column3 = ($values[2].Trim('"'), $values[3].Trim('"')) -join ','
$column4 = $values[4].Trim()
$column5 = $values[5].Trim()
if ($column1 -like "9876" -and $column5 -like "PROD" {
Write-Host $column1
Write-Host $column2
Write-Host $column3
Write-Host $column4
Write-Host $column5
}
This code is working only when all fields are in correct shape but any filed is null or COLUMN 4 & 5 is having value just CC not CC, KK then it is throwing the error.
FINE FOR THIS VALUE
12345 ABC RR,MM K NAO,KUM DEV
NOT SHOWING CORRECT RESULTS FOR SUCH VALUES
34567 CDEF NN INT
89567 KGH PP, BHIM PRKC PROD
9876 PIM DEV
6543 KCDEF NICE,MAN K INT
5432 GHK SIN,NICE C PROD
答案1
得分: 1
如已指出,您的数据不是以CSV格式呈现,而似乎是以固定宽度的列形式呈现,其边界由列名开始的字符位置隐含确定。
以下将您的文件转换为CSV格式,并使用 ConvertTo-Csv
解析结果 - 请注意,此解决方案基于上述假设而通用工作;它既不依赖于特定列数,也不依赖于特定长度:
# 读取文件的标题行和所有数据行。
$headerLine, $dataLines = Get-Content $filePath
# 获取结束字段的字符索引。
# + -1 添加一个额外的数组元素,用作行的末尾的占位符。
$fieldEndIndices = [regex]::Matches($headerLine, ' \S').Index + -1
# 遍历所有数据行。
$objects =
$dataLines |
ForEach-Object {
# 将当前行拆分为字段,修剪每个字段并将其括在 "..." 中。
$pos = 0
$fields =
foreach ($fieldEndIndex in $fieldEndIndices) {
if ($fieldEndIndex -eq -1) { $fieldEndIndex = $_.Length - 1 }
'"' + $_.Substring($pos, $fieldEndIndex - $pos + 1).Trim() + '"'
$pos += $fieldEndIndex - $pos + 1
}
# 输出字段作为CSV行。
$fields -join ','
} |
ConvertFrom-Csv -Header (-split $headerLine) # 解析CSV数据为对象。
运行上述代码后,$objects
包含一个[pscustomobject]
实例的数组,其属性以输入数据中的列命名,其值为字段值。
要可视化结果,您可以运行 $objects | Format-Table
,将得到以下结果,显示数据已按预期解析:
COLUMN1 COLUMN2 COLUMN3 COLUMN4 COLUMN5
------- ------- ------- ------- -------
12345 ABC RR,MM K NAO,KUM DEV
34567 CDEF NN INT
89567 KGH PP, BHIM PRKC PROD
9876 PIM DEV
6543 KCDEF NICE,MAN K INT
5432 GHK SIN,NICE C PROD
英文:
<!-- language-all: sh -->
As has been noted, your data isn't in CSV format, but seemingly in the form of fixed-width columns, whose boundaries are implied by the character positions where the column names start.
The following transforms your file into CSV format and parses the result into objects using ConvertTo-Csv
- note that the solution works generically based on the assumptions above; it neither relies on a specific number of columns nor on their specific lengths:
# Read the file into the header line and all data lines.
$headerLine, $dataLines = Get-Content $filePath
# Get the indices of the characters that end the fields.
# + -1 adds an extra array element that is a placeholder for the end of the line.
$fieldEndIndices = [regex]::Matches($headerLine, ' \S').Index + -1
# Iterate over all data lines.
$objects =
$dataLines |
ForEach-Object {
# Split the line at hand into fields, trim each field and enclose it in "..."
$pos = 0
$fields =
foreach ($fieldEndIndex in $fieldEndIndices) {
if ($fieldEndIndex -eq -1) { $fieldEndIndex = $_.Length - 1 }
'"' + $_.Substring($pos, $fieldEndIndex - $pos + 1).Trim() + '"'
$pos += $fieldEndIndex - $pos + 1
}
# Output the fields as a CSV line
$fields -join ','
} |
ConvertFrom-Csv -Header (-split $headerLine) # Parse the CSV data into objects.
After running the above, $objects
contains an array of [pscustomobject]
instances whose properties are named for the columns in the input data and whose values are the field values.
To visualize the results, you can run $objects | Format-Table
, which yields the following, showing that the data was parsed as intended:
COLUMN1 COLUMN2 COLUMN3 COLUMN4 COLUMN5
------- ------- ------- ------- -------
12345 ABC RR,MM K NAO,KUM DEV
34567 CDEF NN INT
89567 KGH PP, BHIM PRKC PROD
9876 PIM DEV
6543 KCDEF NICE,MAN K INT
5432 GHK SIN,NICE C PROD
答案2
得分: 0
你已经使用数据进行了修复,而不是CSV。请尝试下面的正则表达式,它使用了硬编码的列宽:
$filename = 'c:\temp\test.csv'
$pattern = '(?<column1>.{8})(?<column2>.{8})(?<column3>.{13})(?<column4>.{11})(?<column5>.*?)'
$data = Get-Content -Path $filename | Select-Object -Skip 1 | Select-String -Pattern $pattern
$table = $data | foreach {[PSCustomObject]@{
column1 = $_.Matches.Groups[1].Value.Trim()
column2 = $_.Matches.Groups[2].Value.Trim()
column3 = $_.Matches.Groups[3].Value.Trim()
column4 = $_.Matches.Groups[4].Value.Trim()
column5 = $_.Matches.Groups[5].Value.Trim()
} }
$table | Format-Table
结果如下:
column1 column2 column3 column4 column5
------- ------- ------- ------- -------
12345 ABC RR,MM K NAO,KUM DEV
34567 CDEF NN INT
89567 KGH PP, BHIM PRKC PROD
9876 PIM DEV
6543 KCDEF NICE,MAN K INT
5432 GHK SIN,NICE C PROD
英文:
You have fixed with data, not CSV. Try regex below which uses hardcoded column widths :
<!-- begin snippet: js hide: false console: true babel: false -->
$filename = 'c:\temp\test.csv'
$pattern = '(?<column1>.{8})(?<column2>.{8})(?<column3>.{13})(?<column4>.{11})(?<column5>.*)'
$data = Get-Content -Path $filename | Select-Object -Skip 1 | Select-String -Pattern $pattern
$table = $data | foreach {[PSCustomObject]@{
column1 = $_.Matches.Groups[1].Value.Trim()
column2 = $_.Matches.Groups[2].Value.Trim()
column3 = $_.Matches.Groups[3].Value.Trim()
column4 = $_.Matches.Groups[4].Value.Trim()
column5 = $_.Matches.Groups[5].Value.Trim()
} }
$table | Format-Table
<!-- end snippet -->
Results
<!-- begin snippet: js hide: false console: true babel: false -->
column1 column2 column3 column4 column5
------- ------- ------- ------- -------
12345 ABC RR,MM K NAO,KUM DEV
34567 CDEF NN INT
89567 KGH PP, BHIM PRKC PROD
9876 PIM DEV
6543 KCDEF NICE,MAN K INT
5432 GHK SIN,NICE C PROD
<!-- end snippet -->
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论