英文:
I need to make a dynamic aggregation in Power Query, by summing or concatenating the duplicated values in my tables
问题
这是你的数据示例:
样本 | 方法A | 方法B | 方法C | 方法D | 方法E | 批次编号 | 实验室数据 |
---|---|---|---|---|---|---|---|
样本 1 | 1 | 2 | 8 | TX_0001 | LAB1 | ||
样本 1 | 5 | 9 | TX_0002 | LAB2 | |||
样本 2 | 7 | 8 | 8 | 23 | TX_0001 | LAB1 | |
样本 2 | 41 | TX_0001 | LAB2 | ||||
样本 3 | 11 | 55 | TX_0394 | LAB2 | |||
样本 4 | 2 | 9 | 5 | 9 | TX_0394 | LAB1 |
我需要创建一个M语言代码,根据重复的样本将它们合并在一起。请注意,它们可能在同一批次和/或同一个实验室中,但它们不会用相同的方法制备两次。
因此,我不能传递列名,因为它们不断变化,我希望能够动态传递列名。
注意:我有可能将源表格建立为Microsoft Access的链接表,然后使用SQL来处理,但我找不到MS Access库中的文本聚合函数。在那里,每个列名都可以很容易地处理。(只是我的公司没有其他人懂M语言,我不能让这个过程不自动化)
这是我一直在尝试改进的部分,但我一直遇到一些错误:
- 两个分组列中的所有单元格都有“Errors”。
- 评估耗尽内存。
我无法找出我在这里做错了什么。
let
Source = ALS,
schema = Table.Schema(Source),
columns = schema[Name],
types = schema[Kind],
Table = Table.FromColumns({columns,types}),
Number_Columns = Table.SelectRows(Table, each ([Column2] = "number")),
Other_Columns = Table.SelectRows(Table, each ([Column2] <> "number")),
numCols = Table.Column(Number_Columns, "Column1"),
textColsSID = List.Select(Table.ColumnNames(Source), each Table.Column(Source, _) <> type number),
textCols = List.RemoveItems(textColsSID, {"Sample ID"}),
groupedNum = Table.Group(Source, {"Sample ID"},List.Transform(numCols, each {_, (nmr) => List.Sum(nmr),type nullable number})),
groupedText = Table.Group(Source,{"Sample ID"},List.Transform(textCols, each {_, (tbl) => Text.Combine(tbl, "_")}),
merged = Table.NestedJoin(groupedNum, {"Sample ID"}, groupedText, {"Sample ID"}, "merged"),
expanded = Table.ExpandTableColumn(merged, "merged", Table.ColumnNames(merged{1}[merged]))
in
expanded
这是我期望的结果:
样本 | 方法A | 方法B | 方法C | 方法D | 方法E | 批次编号 | 实验室数据 |
---|---|---|---|---|---|---|---|
样本 1 | 1 | 2 | 5 | 9 | 8 | TX_0001_TX_0002 | LAB1_LAB2 |
样本 2 | 7 | 8 | 8 | 23 | 41 | TX_0001_TX_0001 | LAB1_LAB1 |
样本 3 | 11 | 55 | TX_0394 | LAB2 | |||
样本 4 | 2 | 9 | 5 | 9 | TX_0394 | LAB1 |
英文:
Here's an example of my data:
Sample | Method A | Method B | Method C | Method D | Method E | BATCH Nu | Lab Data |
---|---|---|---|---|---|---|---|
Sample 1 | 1 | 2 | 8 | TX_0001 | LAB1 | ||
Sample 1 | 5 | 9 | TX_0002 | LAB2 | |||
Sample 2 | 7 | 8 | 8 | 23 | TX_0001 | LAB1 | |
Sample 2 | 41 | TX_0001 | LAB2 | ||||
Sample 3 | 11 | 55 | TX_0394 | LAB2 | |||
Sample 4 | 2 | 9 | 5 | 9 | TX_0394 | LAB1 |
I need to make a M Language code that unites them, based on duplicated samples. Note that they might be in the same batch and/or in the same lab, but they won't ever be made the same method twice.
So I can't pass the column names, because they keep changing, and I wanted to do it passaing the column names dynamically.
**OBS: I have the possibility to make a linked table of the source to a Microsoft Access and make this with SQL, but I couldn't find a text aggregation function in MS Access library. There it's possible to each column name with no problem. (Just a matter that no one else knows M Language in my company and I can't let this be non-automated)
**
This is the what I have been trying to improve, but I keep have some errors:
1.Both goruped columns have "Errors" in all of the cells
2.Evaluation running out of memory
I can't discover what I'm doing wrong here.
let
Source = ALS,
schema = Table.Schema(Source),
columns = schema[Name],
types = schema[Kind],
Table = Table.FromColumns({columns,types}),
Number_Columns = Table.SelectRows(Table, each ([Column2] = "number")),
Other_Columns = Table.SelectRows(Table, each ([Column2] <> "number")),
numCols = Table.Column(Number_Columns, "Column1"),
textColsSID = List.Select(Table.ColumnNames(Source), each Table.Column(Source, _) <> type number),
textCols = List.RemoveItems(textColsSID, {"Sample ID"}),
groupedNum = Table.Group(Source, {"Sample ID"},List.Transform(numCols, each {_, (nmr) => List.Sum(nmr),type nullable number})),
groupedText = Table.Group(Source,{"Sample ID"},List.Transform(textCols, each {_, (tbl) => Text.Combine(tbl, "_")})),
merged = Table.NestedJoin(groupedNum, {"Sample ID"}, groupedText, {"Sample ID"}, "merged"),
expanded = Table.ExpandTableColumn(merged, "merged", Table.ColumnNames(merged{1}[merged]))
in
expanded
This is what I expected to have:
Sample | Method A | Method B | Method C | Method D | Method E | BATCH Nu | Lab Data |
---|---|---|---|---|---|---|---|
Sample 1 | 1 | 2 | 5 | 9 | 8 | TX_0001_TX_0002 | LAB1_LAB2 |
Sample 2 | 7 | 8 | 8 | 23 | 41 | TX_0001_TX_0001 | LAB1_LAB1 |
Sample 3 | 11 | 55 | TX_0394 | LAB2 | |||
Sample 4 | 2 | 9 | 5 | 9 | TX_0394 | LAB1 |
答案1
得分: 2
这是一个方法,它仅假定第一列是用于分组不同样本的列。
它不对任何列名或列数做出假设。
它测试每列的前10行(去除任何空值后)以确定列的类型是否可以是“数值类型”,否则它将假定为“文本类型”。
如果存在其他可能的数据类型,类型检测代码可以进行扩展。
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//从前十行动态检测数据类型
//仅检测“文本”和“数值”
colNames = Table.ColumnNames(Source),
checkRows = 10,
colTestTypes = List.Generate(
()=>[t=
let
Values = List.FirstN(Table.Column(Source,colNames{0}),10),
tryNumber = List.Transform(List.RemoveNulls(Values), each (try Number.From(_))[HasError])
in
tryNumber, idx=0],
each [idx] < List.Count(colNames),
each [t=
let
Values = List.FirstN(Table.Column(Source,colNames{[idx]+1}),10),
tryNumber = List.Transform(List.RemoveNulls(Values), each (try Number.From(_))[HasError])
in
tryNumber, idx=[idx]+1],
each [t]),
colTypes = List.Transform(colTestTypes, each if List.AllTrue(_) then type text else type number),
//基于第一列分组并求和或连接列
group = Table.Group(Source,{colNames{0}},
{"rw", (t)=>
Record.FromList(
List.Generate(
()=>[rw=if colTypes{1} = type number
then List.Sum(Table.Column(t,colNames{1}))
else Text.Combine(Table.Column(t,colNames{1}),"_"),
idx=1],
each [idx] < List.Count(colNames),
each [rw=if colTypes{[idx]+1} = type number
then List.Sum(Table.Column(t,colNames{[idx]+1}))
else Text.Combine(Table.Column(t,colNames{[idx]+1}),"_"),
idx=[idx]+1],
each [rw]), List.RemoveFirstN(colNames,1)), type record}
),
//展开记录列并设置数据类型
#"Expanded rw" = Table.ExpandRecordColumn(group, "rw", List.RemoveFirstN(colNames,1)),
#"Set Data Type" = Table.TransformColumnTypes(#"Expanded rw", List.Zip({colNames, colTypes}))
in
#"Set Data Type"
<details>
<summary>英文:</summary>
Here is a method which assumes only that the first column is a column which will be used to group the different samples.
It makes no assumptions about any column names, or the numbers of columns.
It tests the first 10 rows in each column (after removing any nulls) to determine if the column type can be `type number`, otherwise it will assume `type text`.
If there are other possible data types, the type detection code can be expanded.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//dynamically detect data types from first ten rows
//only detecting "text" and "number"
colNames = Table.ColumnNames(Source),
checkRows = 10,
colTestTypes = List.Generate(
()=>[t=
let
Values = List.FirstN(Table.Column(Source,colNames{0}),10),
tryNumber = List.Transform(List.RemoveNulls(Values), each (try Number.From())[HasError])
in
tryNumber, idx=0],
each [idx] < List.Count(colNames),
each [t=
let
Values = List.FirstN(Table.Column(Source,colNames{[idx]+1}),10),
tryNumber = List.Transform(List.RemoveNulls(Values), each (try Number.From())[HasError])
in
tryNumber, idx=[idx]+1],
each [t]),
colTypes = List.Transform(colTestTypes, each if List.AllTrue(_) then type text else type number),
//Group and Sum or Concatenate columns, keying on the first column
group = Table.Group(Source,{colNames{0}},
{"rw", (t)=>
Record.FromList(
List.Generate(
()=>[rw=if colTypes{1} = type number
then List.Sum(Table.Column(t,colNames{1}))
else Text.Combine(Table.Column(t,colNames{1}),""),
idx=1],
each [idx] < List.Count(colNames),
each [rw=if colTypes{[idx]+1} = type number
then List.Sum(Table.Column(t,colNames{[idx]+1}))
else Text.Combine(Table.Column(t,colNames{[idx]+1}),""),
idx=[idx]+1],
each [rw]), List.RemoveFirstN(colNames,1)), type record}
),
//expand the record column and set the data types
#"Expanded rw" = Table.ExpandRecordColumn(group, "rw", List.RemoveFirstN(colNames,1)),
#"Set Data Type" = Table.TransformColumnTypes(#"Expanded rw", List.Zip({colNames, colTypes}))
in
#"Set Data Type"
***Original Data***
[![enter image description here][1]][1]
***Results***
[![enter image description here][2]][2]
[1]: https://i.stack.imgur.com/WEVpC.png
[2]: https://i.stack.imgur.com/jrPZx.png
</details>
# 答案2
**得分**: 1
以下是代码的翻译部分:
```javascript
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
names = List.Distinct(List.Select(Table.ColumnNames(Source), each Text.Contains(_,"Method")),
"Grouped Rows" = Table.Group(Source, {"Sample"}, {{"data", each _, type table }}),
"Added Custom" = Table.AddColumn("Grouped Rows", "Batch Nu", each Text.Combine(List.Distinct([data][BATCH Nu]),"_")),
"Added Custom1" = Table.AddColumn("Added Custom", "Lab Data", each Text.Combine(List.Distinct([data][Lab Data]),"_")),
"Added Custom2" = Table.AddColumn("Added Custom1", "Custom", each Table.SelectRows(Table.UnpivotOtherColumns([data], {"Sample"}, "Attribute", "Value"), each List.Contains(names,[Attribute])),
"Added Custom3" = Table.AddColumn("Added Custom2", "Custom.1", each Table.Pivot([Custom], List.Distinct([Custom][Attribute]), "Attribute", "Value", List.Sum)),
"Expanded Custom.1" = Table.ExpandTableColumn("Added Custom3" , "Custom.1", names,names),
"Removed Columns" = Table.RemoveColumns("Expanded Custom.1",{"data", "Custom"})
in "Removed Columns"
希望这对您有所帮助。
英文:
One way. You could probably do this all within the group as well
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
names = List.Distinct(List.Select(Table.ColumnNames(Source), each Text.Contains(_,"Method"))),
#"Grouped Rows" = Table.Group(Source, {"Sample"}, {{"data", each _, type table }}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Batch Nu", each Text.Combine(List.Distinct([data][BATCH Nu]),"_")),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Lab Data", each Text.Combine(List.Distinct([data][Lab Data]),"_")),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom", each Table.SelectRows(Table.UnpivotOtherColumns([data], {"Sample"}, "Attribute", "Value"), each List.Contains(names,[Attribute]))),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "Custom.1", each Table.Pivot([Custom], List.Distinct([Custom][Attribute]), "Attribute", "Value", List.Sum)),
#"Expanded Custom.1" = Table.ExpandTableColumn(#"Added Custom3" , "Custom.1", names,names),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Custom.1",{"data", "Custom"})
in #"Removed Columns"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论