SSIS脚本任务 – 计算输出对象中唯一行的数量

huangapple go评论68阅读模式
英文:

SSIS Script Task - Count Number of Unique Rows in Output Object

问题

我的SSIS脚本任务将5行记录生成到Output脚本组件中。

如果条件满足,
{
WOProductBuffer.AddRow();
WOProductBuffer.WorkOrderId = workOrderId;
WOProductBuffer.WorkOrderProductId = workOrderProductId;
//其他字段
}
目标: 计算行数,按WorkOrderIdWorkOrderProductId分组,并将此计数值设置为WopCount

我意识到PostExecute()方法无法读取Output对象WOProduct,因此可能不可能。

基于所有行,是否有一种方法可以在同一个脚本任务中实现这一点?

或者唯一的方法是创建一个新的脚本任务,在PreExecute()方法中循环遍历所有记录以生成计数值?

我尝试过的:

  1. WOProductBuffer添加到列表中,并在PostExecute()方法中循环。这似乎不起作用,因为每行似乎不包含任何值。
    当前正在尝试:
  2. 如何在新的脚本任务中循环遍历输入行。
英文:

My SSIS script task generates 5 rows of records into an Output script component.

if (conditionMeets)
{
    WOProductBuffer.AddRow();
    WOProductBuffer.WorkOrderId = workOrderId;
    WOProductBuffer.WorkOrderProductId = workOrderProductId;
    //other fields
}

SSIS脚本任务 – 计算输出对象中唯一行的数量

> Objective: Count number of rows, group by WorkOrderId and WorkOrderProductId, and set this count value to WopCount

I realized that PostExecute() method is unable to read the Output object WOProduct, so it is likely not possible.

Based on all the rows, is there a way to implement this within the same script task?

Or the only way is to create a new script task, loop all the records in PreExecute() method to generate the count value?


What I have tried:

  1. Adding WOProductBuffer to a list, and loop in PostExecute() method. This didn't work as each row doesn't seem to contain any value

Currently trying:

  1. How to loop through Input rows in a new script task

答案1

得分: 0

从之前的问题中,您有一个类似于以下的类:

public class WorkOrderProduct
{
    public Guid workOrderId;
    public Guid workOrderProductId;
}

在脚本任务中,您将希望在PostExecute方法中使用最终计数更新一个SSIS变量,以便我们知道这只能在PostExecute方法中进行。

类似于我们在https://stackoverflow.com/questions/75385090/how-to-access-parameters-of-user-variables-in-ssis-script-task中将wopList的声明移到类级别,我们将创建一个类似的列表,只是类型不同。

有两种方法可以实现这一点。您可以在代码中实现唯一逻辑,并仅将唯一项目添加到列表中。另一个选项是使用一小部分LINQ并让它为您执行逻辑。

决策点包括:

  • 我理解和希望维护什么?
  • 与列表的大小相对的预期基数是什么 - 即我们预计有多少总行与有多少唯一行?如果小于百万,嗯,我无法想象您选择哪个会有什么不同。超过一百万,我会开始使用我的咨询卡,但我怀疑您会没事。十亿?是的,我打赌事情将开始变得有趣。如果什么都不行,您可能需要使用LongCount而不是Count方法。

创建一个类成员 - 选择一个(或两个并尝试它们)

List<KeyValuePair<Guid, Guid>> option1;
List<KeyValuePair<Guid, Guid>> option2;

PreExecute方法中,实例化这些列表

this.option1 = new List<KeyValuePair<Guid, Guid>>();
this.option2 = new List<KeyValuePair<Guid, Guid>>();

在现有逻辑中,作为最后一步,实施选项1或2。我们将创建我们的两个Guid的KeyValuePair

然后,我们将询问现有列表是否已经有了其中之一?如果没有,我们将将其添加到我们的option1列表中。最后,我们将它添加到option2列表中,因为我们稍后将找出唯一项。

if (conditionMeets)
{
    // 在此处执行业务流程

    KeyValuePair<Guid, Guid> newItem = new KeyValuePair<Guid, Guid>(workOrderId, workOrderProductId);
    
    if (!option1.Contains(newItem))
    {
        option1.Add(newItem);
    }

    // 只是添加它,我们稍后会找出它
    option2.Add(newItem);
}

PostExecute方法中,您可以像您已经完成了一样使用option1列表上的Count属性来获取唯一值的计数。

对于option2,我们将调用Distinct方法,然后链接一个对Count()方法的调用。请注意这里括号的使用差异,否则您的代码将出现“无法将'method group'转换为int”的语法错误。

最后,Console.WriteLine对您没有任何帮助。相反,您将值分配回您的变量,就像我在下面所示的那样。

bool pbFireAgain = false;
int uniqueCount = 0;

// 选项1方法
uniqueCount = option1.Count;
// 将该值推送到运行日志,以便我们可以跟踪生成了什么
this.ComponentMetaData.FireInformation(0, "SCR PostExecute Counts", string.Format("option1 count is {0}", option1.Count), "", 0, ref pbFireAgain);

// 选项2逻辑
uniqueCount = option2.Distinct().Count();
// 将选项2的值推送到输出日志
this.ComponentMetaData.FireInformation(0, "SCR PostExecute Counts", string.Format("option2 Distinct Count is {0}, total Count is {1}", option2.Distinct().Count(), option2.Count()), "", 0, ref pbFireAgain);

this.Variables.MySSISVariable = uniqueCount;
英文:

From an earlier question, you have a class something like this

public class WorkOrderProduct
{
    public Guid workOrderId;
    public Guid workOrderProductId;
}

In a script task, you'll want to update an SSIS Variable with the final count so we know that can only take place in the PostExecute method.

Similar to how we moved the declaration of wopList to the class level on https://stackoverflow.com/questions/75385090/how-to-access-parameters-of-user-variables-in-ssis-script-task we would create a similar list, just with different a type.

There are two ways of doing this. You can either implement the distinct logic in your code and only add to the list unique items. The other option is to use a tiny bit of LINQ and let it do the logic for you.

The decision points become:

  • What do I understand and want to maintain?
  • What's the expected cardinality relative to the size of the list - aka how many total rows would we expect versus how many uniques? If it's under millions, eh, I can't imagine it making a difference which you choose. Above a million, I'll start pulling out my consulting "It Depends" card but I suspect you'll be fine. Billions? yeah, I bet things will start to get interesting. If nothing else, you'll probably need LongCount instead of the Count method.

Create a class member - pick one (or both and try them out)

List&lt;KeyValuePair&lt;Guid, Guid&gt;&gt; option1;
List&lt;KeyValuePair&lt;Guid, Guid&gt;&gt; option2;

In your PreExecute method, instantiate the List(s)

this.option1 = new List&lt;KeyValuePair&lt;Guid, Guid&gt;&gt;();
this.option2 = new List&lt;KeyValuePair&lt;Guid, Guid&gt;&gt;();

In your existing logic, as a final step, implement option 1 or 2. We will create a KeyValuePair of our two Guids.

We will then ask the existing List if it has one of those already? If it does not, we'll add it to our option1 list.
Finally, we'll just add it to option2 list as we'll figure out uniques later.

if (conditionMeets)
{
    // Doing our business process here thing

    KeyValuePair&lt;Guid, Guid&gt; newItem = new KeyValuePair&lt;Guid, Guid&gt;(workOrderId, workOrderProductId);
    
    if (!option1.Contains(newItem))
    {
        option1.Add(newItem);
    }

    // Just add it and we&#39;ll figure it out later
    option2.Add(newItem);
}

In your PostExecute method, you can use the Count property on the option1 List as you've already done the heavy lifting to only add distinct values.

For option2, we'll invoke the Distinct Method and then chain a call to Count() method. Do note the difference in when we use parentheses here otherwise you'll have "Cannot convert from 'method group' to int" syntax error in your code.

Finally, Console.WriteLine doesn't do you any good. Instead you'll assign values back to your Variable or as I showed in

bool pbFireAgain = false;
int uniqueCount = 0;

// Option 1 approach
uniqueCount = option1.Count;
// Pop the value into the run log so we can trace what was generated
this.ComponentMetaData.FireInformation(0, &quot;SCR PostExecute Counts&quot;, string.Format(&quot;option1 count is {0}&quot;, option1.Count), &quot;&quot;, 0, ref pbFireAgain);

// Option 2 logic
uniqueCount = option2.Distinct().Count();
// Push the option 2 values into the output log
this.ComponentMetaData.FireInformation(0, &quot;SCR PostExecute Counts&quot;, string.Format(&quot;option2 Distinct Count is {0}, total Count is {1}&quot;, option2.Distinct().Count(), option2.Count()), &quot;&quot;, 0, ref pbFireAgain);


this.Variables.MySSISVariable = uniqueCount;

huangapple
  • 本文由 发表于 2023年2月10日 10:58:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75406507.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定