OutputBuffer在处理大型C#列表时不起作用

huangapple go评论67阅读模式
英文:

OutputBuffer not working for large c# list

问题

我目前正在使用SSIS对项目进行改进。需要将单个文档插入到MongoDB的时间序列类型集合中。在某个时候,我想通过C#转换脚本获取经过处理的数据行。我做了这个:

foreach (BsonDocument bson in listBson)
{
    OutputBuffer.AddRow();
    OutputBuffer.DatalineX = (string) bson.GetValue("data");
}

但这段代码在处理小文件时效果很好,但在处理600万行文件时却无法正常工作。也就是说,输出中没有行。接下来的其他任务会验证通过,但它们好像没有接收到任何输入一样。问题可能出在哪里?

英文:

I'm currently using SSIS to do an improvement on a project. need to insert single documents in a MongoDB collection of type Time Series. At some point I want to retrieve rows of data after going through a C# transformation script. I did this:

foreach (BsonDocument bson in listBson)
{
    OutputBuffer.AddRow();
    OutputBuffer.DatalineX = (string) bson.GetValue("data");
}

But this piece of code that works great with small file does not work with a 6 million line file. That is, there are no lines in the output. The other following tasks validate but react as if they had received nothing as input.
Where could the problem come from?

答案1

得分: 1

你的 OutputBuffer 将 DatalineX 定义为字符串,可以是 DT_STR 或 DT_WSTR,且具有特定长度。当超出该值时,会出现问题。在普通字符串中,最大长度分别为 8k 或 4k。

但这两者对于至少 6M 字符的使用情况并不适用。为处理这种情况,你需要将数据类型更改为 DT_TEXT/DT_NTEXT。这些数据类型不需要长度,它们是“max”类型。在使用 LOB 类型时需要注意许多事项:

  • 性能可能不佳,取决于 SSIS 是否能将数据保留在内存中(好)或必须将中间值写入磁盘(差)
  • 无法轻松在数据流中操纵它们
  • 在脚本组件中,你需要使用不同的语法来处理它们

例如:

// TODO: 转换为字节
Output0Buffer.DatalineX.AddBlobData(bytes);

更多关于解决字节编码的问题的示例,可以参考 https://stackoverflow.com/a/74902194/181965 的示例,但准确性可能存在疑问。

英文:

Your OuputBuffer has DatalineX defined as a string, either DT_STR or DT_WSTR and a specific length. When you exceed that value, things go bad. In normal strings, you'd have a maximum length of 8k or 4k respectively.

Neither of which are useful for your use case of at least 6M characters. To handle that, you'll need to change your data type to DT_TEXT/DT_NTEXT Those data types do not require a length as they are "max" types. There are lots of things to be aware of when using the LOB types.

  • Performance can suck depending on whether SSIS can keep the data in memory (good) or has to write intermediate values to disk (bad)
  • You can't readily manipulate them in a data flow
  • You'll use a different syntax in a Script Component to work with them

e.g.

// TODO: convert to bytes
Output0Buffer.DatalineX.AddBlobData(bytes);

Longer example of questionable accuracy with regard to encoding the bytes that you get to solve at https://stackoverflow.com/a/74902194/181965

huangapple
  • 本文由 发表于 2023年2月13日 23:58:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75438301.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定