处理来自Excel的非结构化数据在SSIS中

huangapple go评论60阅读模式
英文:

Handling Unstructured Data from Excel in SSIS

问题

需要处理来自Excel的非结构化数据。合并单元格数据需要拆分并设置到相应的列中。已附上源文件和目标文件链接。合并的数据需要取消合并,并将数据填充到这些未合并的列中。在SSIS中,当我导入这个Excel时,它会取消合并并将数据放在第一列,然后将其他单元格留为空。

源文件:tbl_e1 - 处理来自Excel的非结构化数据在SSIS中
目标文件:tbl_e2 - 处理来自Excel的非结构化数据在SSIS中
目标将加载到SSMS。

详细要求解释:

我的Excel源文件在Dept1、Dept2上有垂直合并的单元格,同时在Dept2列的第2行和第3行上有水平合并的单元格,如您所见,其中的值为"Banking"。类似地,值列的第2行和第3行中的值为"20000"。请注意,当我在SSIS中导入此源文件时,它会这样排列事物:

ID | 姓名 | 部门1 | 部门2 | 值

  • | - | - | - | -
    1 | Harish | HR | NULL | 10000
    2 | Kano | Finance | Banking | 20000
    3 | Tangiro | HR | NULL | NULL

我想要用在合并状态时这些单元格内的数据填充未合并的单元格(即源文件中的数据)。

我希望ID 1的水平合并单元格Dept1和Dept2的部分在Dept1和Dept2中都有"HR"的值。

对于ID 2和ID 3的Dept2列上的垂直合并单元格,我希望它们在Dept2中都有"Banking"。同样,对于值列上的ID 2和ID 3的垂直合并单元格,我希望它们在值中都有"20000",对于ID 2和ID 3都是如此。我尝试过使用窗口函数,但没有成功,还尝试了其他逻辑和脚本任务。

英文:

I need help with Handling Unstructured Data from Excel. The merged cell data needs to be splitted and set to the respective columns in this case. I have attached the source and target.
The Merged Data need to be unmerged and the data need to be populated to those unmerged columns.
In SSIS when i imported this excel it unmerges and places the data in the first column then leaves the other cell as NULL.

Source: tbl_e1 - 处理来自Excel的非结构化数据在SSIS中
Target: tbl_e2 处理来自Excel的非结构化数据在SSIS中
target to be loaded in SSMS.

Detailed Explanation of Requirement:

My Excel Source which has vertically merged cells on Dept1,Dept2 and horizontally merged cells on row2 and row3 on dept2 column as you can see which has the value Banking. similarly row2 and row3 on value column as you can see which has the value 20000. Note that when i imported this source in ssis I can see that this is how it arranges things,

ID Name Dept1 Dept2 Value
1 Harish HR NULL 10000
2 Kano Finance Banking 20000
3 Tangiro HR NULL NULL

I want to populate the unmerged cells with the data which was inside those cells when it was in merged state (ie. Source).

I want the horizontally merged cells Dept1 and Dept2 of ID 1 to have HR as value in Dept1 and Dept2.

For the Vertically merged cells on Dept2 column for ID 2 and ID 3 I want them to have Banking in Dept2 for Both Id2 and Id 3. Similarly for Column Value which has Vertically Merged cells on ID 2 and ID 3 I want them to have 20000 in Value for Both Id2 and Id 3.

I thought of trying this using the window functions which is not working and other logical things with script task.

答案1

得分: 1

以下是翻译好的内容:

在使用SSIS之前,您需要清理文件,首先修复它会更简单。

以下的Excel VBA子程序将查找并替换所有合并单元格,将它们拆分并用源单元格的相同值替换:

Sub UnmergeCellsAndDuplicateValues()
    
    Dim ws As Worksheet
    Dim r As Range
    Dim RangeAddress As String
    Dim CellValue

    For Each ws In Worksheets
        ws.Activate
        For Each r In ws.UsedRange
            If r.MergeCells Then
               RangeAddress = r.MergeArea.Address
               CellValue = r
               r.MergeCells = False
               Range(RangeAddress) = CellValue
            End If
        Next
    Next ws
    
End Sub

您可以构建一个小的可执行程序,将文件名作为参数传递,然后打开工作簿并运行这个VBA程序,但我很少使用它,因此还没有这样做。不过,这意味着您可以在您的SSIS工作流中使用它,这取决于需要运行此操作的频率以及是否需要批量处理文件等情况。

英文:

You need to clean the file before attempting to do anything with it in SSIS, it's much simpler to fix it first.

The following Excel VBA sub will find and replace all merged cells by unmerging them and then replacing them with the same value as the source cell.

Sub UnmergeCellsAndDuplicateValues()
    
    Dim ws As Worksheet
    Dim r As Range
    Dim RangeAddress As String
    Dim CellValue

    For Each ws In Worksheets
        ws.Activate
        For Each r In ws.UsedRange
            If r.MergeCells Then
               RangeAddress = r.MergeArea.Address
               CellValue = r
               r.MergeCells = False
               Range(RangeAddress) = CellValue
            End If
        Next
    Next ws
    
End Sub

You could probably build a small executable that takes a filename as a parameter, opens the workbook and runs this VBA over it but I've not done that as I rarely use it. However, that would mean you could use this in your SSIS workflow , I suppose it depends on how often this will need to be run and/or will you need to run batches of files etc...

huangapple
  • 本文由 发表于 2023年4月10日 19:54:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75976877.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定