英文:
Sorting on similar values using talend
问题
DeptID | SID | StudentName | MID |
---|---|---|---|
111 | A1 | Nancy | A1 |
111 | A1 | Nancy | B1 |
111 | A1 | Nancy | C1 |
222 | Z1 | James | Z1 |
英文:
I have a CSV file
I want to group together similar DeptID and sort the MID in ascending order and assign the value of lowest MID to SID who have similar DeptID using talend open studio for data integration. If there is one DeptID, then assign the same value of MID to SID.
Input CSV :
DeptID | SID | StudentName | MID |
---|---|---|---|
111 | Nancy | C1 | |
111 | Nancy | B1 | |
111 | Nancy | A1 | |
222 | James | Z1 |
I have used tFileInputDelimited to read the input file, I have used tSortRow to sort MID. And I have used tAggregateRow to group the values.
I am getting output as:
DeptID | SID | StudentName | MID |
---|---|---|---|
111 | Nancy | [A1,B1,C1] | |
222 | James | [Z1] |
The Output CSV should be as follows:
DeptID | SID | StudentName | MID |
---|---|---|---|
111 | A1 | Nancy | A1 |
111 | A1 | Nancy | B1 |
111 | A1 | Nancy | C1 |
222 | Z1 | James | Z1 |
答案1
得分: 1
一种简单的解决方案是两次读取您的输入文件:一次作为主流程,以获取MID列的详细信息,一次作为查找,以获取MID列的MIN值作为您的SID列。然后使用tMap连接这两个流,使用deptID作为连接键(使用“所有匹配”连接类型)。
另一种解决方案是使用tMap的内部变量来完成工作,减少组件数量:
在使用tSortRow对数据进行排序后,在tMap中创建2个变量,按照tSort组件的顺序:
- “sequence”根据DeptId创建一个递增,从1开始。
- “currentVal”检查sequence是否等于1:如果是,则将当前的MID作为SID。否则,SID不变。
这两种解决方案都可以用于获取SID值。
英文:
One simple solution would be to read twice your input file : one time as the main flow, to get detail of MID column , one time as the lookup to get MIN value of MID column as your SID column. Then join the 2 flows with a tMap, joining on deptID (with "all matches" join type).
Another solution could be to use internal variables of tMap to get the work done with fewer components :
Once you have sorted your data with tSortRow, create 2 variables in a tMap following your tSort component :
-
"sequence" creates an increment based on DeptId, starting at 1
-
"currentVal" checks if sequence equals 1 : if so you get the current MID as SID. Else SID don't change.
The 2 solutions work to get the SID value .
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论