英文:
SSIS Prevent Partial CSV Files from being Created When Job Fails
问题
有一个我正在构建的SSIS包,在最终处于循环中。它从OLEDB源获取数据,将其发送并动态创建CSV文件。现在作为循环,它将根据数组系统创建文件,假设它循环5次并创建5个不同的文件。
我不希望发生的情况是,假设OLEDB源中有50,000条记录。它被发送到平面文件目标[CSV文件],但出于某种原因失败了。结果只发送了20,000条记录,并创建了该文件。但这是错误的,因为这是一个部分文件,因为它只有20,000条记录,理论上应该有50,000条记录。
是否有办法在作业失败时防止创建部分文件?
所以,我希望的情况是相同的情况下,50,000条记录被发送到CSV文件,但作业在中途失败,只能发送20,000条记录,但CSV文件没有创建,因为它失败了。
这种情况可能吗?
我尝试添加计数机制,但它只在文件创建或未创建时起作用,但不考虑文件内容。
英文:
There is a ssis package I am building where ultimately its in a loop.
It takes data from an oledb source and sends it over and creates the csv file dynamically.
Now as a loop it will create files based on the array system so let say it loops 5 times and creates 5 different files.
A situation I don't want to happen is if let's say there are 50,000 records from oledb source. That is being sent over to the flat file destination [the csv file] but for some reason it fails. As a result only 20,000 records is sent with that file being created. But this is wrong because this is a partial file since it only has 20,000 records when it should have 50,000 records theoretically.
Is there a way to prevent that partial file from being created in the first place when the job fails?
So what I would want is the same scenario 50,000 records sent over to csv file, but job fails in the middle, only was able to send 20,000 records, but the csv file isn't created because it failed.
Is this possible?
I tried to add in a counter mechanism but it only works if the file is created or not but doesn't account for the contents of the file.
答案1
得分: 1
你可以通过将过程分为两个步骤来实现这一目标:第一步是创建临时文件,第二步是在第一步成功完成后将该文件重命名或移动到最终目的地,如果第一步失败,你将只剩下一个不影响最终输出并可以删除的临时文件。
在控制流中创建一个临时平面文件连接(即temp_output.csv
),并将你的OLEDB数据发送到这个临时文件。
一旦数据流任务成功完成,使用文件系统任务在控制流中将temp_output.csv
重命名为最终所需的文件名(即final_output.csv
)。
如果数据流任务因任何原因失败(例如仅发送了50,000条记录中的20,000条),则文件不会被重命名,部分完成的temp_output.csv
仍然存在。你可以在包执行开始时添加另一步来删除临时文件,以确保先前的执行失败不会影响新的执行。
英文:
You can achieve this by dividing the process into two steps: the first one being the creation of a temporary file, and the second being the renaming or moving of this file to its final destination, only if the first process completes successfully. If the process fails, you'll just be left with a temporary file that doesn't interfere with your final output and can be deleted.
Create a temporary flat file connection (i.e., temp_output.csv
) and send your OLEDB data to this temporary file.
Once the data flow task finishes successfully, use a File System Task in the Control Flow to rename temp_output.csv
to the final desired filename (i.e., final_output.csv
).
If the Data Flow Task fails for any reason (like when only 20,000 of 50,000 records were sent), the file is not renamed, and the partially complete temp_output.csv
remains. You can add another step to delete the temporary file at the beginning of your package execution to ensure that a failed previous execution does not interfere with the new one.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论