将JSON文件加载到SQL Server表中,使用Powershell。

huangapple go评论89阅读模式
英文:

Load JSON file into SQL Server Table with Powershell

问题

我正在尝试使用Powershell将一个JSON文件(来自Mongo的Export-MdbcData输出)加载到SQL Server表中。以下是示例JSON文件数据:

{ "code" : "0088", "name" : "BUTTON", "detail" : { "quantity" : 1 } }
{ "code" : "0081", "name" : "MATTERHORN", "detail" : { "quantity" : 2 } }
{ "code" : "0159", "name" : "BANKSTON", "detail" : { "quantity" : 1 } }

在下面的Powershell脚本中,文件被读入一个数组,然后数组被转换成一个数据表以加载到SQL Server表中。是否有更好/更快的方法来读取JSON文件?对于小型输入文件,加载数据只需要几秒钟,但对于超过4M条记录的文件,整个过程需要数小时。

$encoding = [System.Text.Encoding]::UTF8    
$output = [System.Collections.ArrayList]::new()

foreach ($line in [System.IO.File]::ReadLines($pathToJsonFile, $encoding)) 
{
    $json = $line | ConvertFrom-Json 
    foreach ($detail in $json.detail) 
    {
       [void]$output.Add(
                          [pscustomobject]@{
                                      code = $json.code
                                      name = $json.name
                                      quantity = $detail.quantity
                                    }
                        )
    } 
} 
$dataTable = [System.Data.DataTable]::new()
$dataTable = $output | ConvertTo-DataTable

更新:
我根据@Charlieface的建议修改了脚本,并删除了内部的foreach语句,以查看是否可以加快速度。它在约17分钟内加载了4M+条记录。我使用了批处理大小为80K,每个插入迭代大约需要14秒。然而,与具有相同批处理大小和记录数的CSV文件输入相比,插入迭代大约需要3秒。我猜测JSON的解析比分隔文件更耗时。

foreach ($line in [System.IO.File]::ReadLines($pathToJsonFile, $encoding)) 
{
    $json = $line | ConvertFrom-Json;
    [void]$dataTable.Rows.Add($json.code, $json.name, $json.detail.quantity);
    $i++; 
    if (($i % $batchsize) -eq 0) { 
        $bulkcopy.WriteToServer($dataTable) 
        Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
        $datatable.Clear() 
    }

}
英文:

I am trying to load a JSON file (output from Mongo's Export-MdbcData) into a SQL Server table using Powershell. The example JSON file data is as follows:

{ "code" : "0088", "name" : "BUTTON", "detail" : { "quantity" : 1 } }
{ "code" : "0081", "name" : "MATTERHORN", "detail" : { "quantity" : 2 } }
{ "code" : "0159", "name" : "BANKSTON", "detail" : { "quantity" : 1 } }

In the Powershell script below, the file is read into an array and the array is converted into a datatable to load into a SQL server table. Is there a better/faster way to read in the JSON file? With a small input file, it only takes seconds to load the data but with more than 4M records, it is taking hours for the whole process.

$encoding = [System.Text.Encoding]::UTF8    
$output = [System.Collections.ArrayList]::new()

foreach ($line in [System.IO.File]::ReadLines($pathToJsonFile, $encoding)) 
{
    $json = $line | ConvertFrom-Json 
    foreach ($detail in $json.detail) 
    {
       [void]$output.Add(
                          [pscustomobject]@{
                                      code = $json.code
                                      name = $json.name
                                      quantity = $detail.quantity
                                    }
                        )
    } 
} 
$dataTable = [System.Data.DataTable]::new()
$dataTable = $output | ConvertTo-DataTable
.
.

UPDATE:
I modified the script using @Charlieface's suggestion and removed the inner foreach statement to see if it will speed it up more. It loaded 4M+ records in about 17 minutes. I used batchsize = 80K and each insert iteration took about 14 seconds. However, comparing to a CSV file input with the same batch size and record count, the insert iteration takes about 3 seconds. I'm guessing the parsing of the JSON takes longer than a delimited file.

foreach ($line in [System.IO.File]::ReadLines($pathToJsonFile, $encoding)) 
{
    $json = $line | ConvertFrom-Json;
    [void]$dataTable.Rows.Add($json.code, $json.name, $json.detail.quantity);
    $i++; 
    if (($i % $batchsize) -eq 0) { 
        $bulkcopy.WriteToServer($dataTable) 
        Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
        $datatable.Clear() 
    }

}

答案1

得分: 2

以下是您要翻译的代码部分:

$dataTable = [System.Data.DataTable]::new();
[void]$dataTable.Columns.Add('code', [string]);
[void]$dataTable.Columns.Add('name', [string]);
[void]$dataTable.Columns.Add('quantity', [int]);

$encoding = [System.Text.Encoding]::UTF8;

foreach ($line in [System.IO.File]::EnumerateLines($pathToJsonFile, $encoding)) 
{
    $json = $line | ConvertFrom-Json;
    foreach ($detail in $json.detail) 
    {
        [void]$dataTable.Rows.Add($json.code, $json.name, $detail.quantity);
    }
}

您可能还希望预先分配足够大的容量以防止底层数组的调整大小。

$dataTable.MinimumCapacity = 4100000;
英文:

Might be faster to just create and add the data directly to the datatable, and not use ArrayList or pscustomobject

$dataTable = [System.Data.DataTable]::new();
[void]$dataTable.Columns.Add('code', [string]);
[void]$dataTable.Columns.Add('name', [string]);
[void]$dataTable.Columns.Add('quantity', [int]);

$encoding = [System.Text.Encoding]::UTF8;

foreach ($line in [System.IO.File]::EnumerateLines($pathToJsonFile, $encoding)) 
{
    $json = $line | ConvertFrom-Json;
    foreach ($detail in $json.detail) 
    {
        [void]$dataTable.Rows.Add($json.code, $json.name, $detail.quantity);
    }
}

You may also want to pre-allocate the datatable capacity to some large enough capacity to prevent resizing of the underlying array.

$dataTable.MinimumCapacity = 4100000;

答案2

得分: 0

假设您有足够的内存,您可以按以下方式大大加快处理速度:

$dataTable =
  '[' + (Get-Content -Raw -Encoding utf8 $pathToJsonFile) + ']' |
    ConvertFrom-Json |
    ForEach-Object {
      foreach ($detail in $_.detail) { 
        [pscustomobject]@{
          code = $_.code
          name = $_.name
          quantity = $detail.quantity
         }
      }
    } |
    ConvertTo-DataTable
英文:

<!-- language-all: sh -->

Assuming you have enough memory, you can greatly speed up your processing as follows:

$dataTable =
  &#39;[&#39; + (Get-Content -Raw -Encoding utf8 $pathToJsonFile) + &#39;]&#39; |
    ConvertFrom-Json |
    ForEach-Object {
      foreach ($detail in $_.detail) { 
        [pscustomobject]@{
          code = $_.code
          name = $_.name
          quantity = $detail.quantity
         }
      }
    } |
    ConvertTo-DataTable

huangapple
  • 本文由 发表于 2023年6月13日 05:44:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460514.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定