英文:
Excel: How to analyze data in a table that contains multivalue cells
问题
我现在正在进行一个关于昆虫的科学项目,并一直在记录我找到的昆虫的信息。现在我意识到,每次观察都注册昆虫的名称是一个不好的决定。
我不允许提供太多信息,因为这是保密的,但我将在下表中添加一个类似的案例示例:
样本编号 | 昆虫(科) |
---|---|
1 | 皮毛虫科, 圆胸甲虫科 |
2 | 圆胸甲虫科, Dichotumius |
3 | 圆胸甲虫科 |
4 | 皮毛虫科, 圆胸甲虫科 |
5 | 真甲虫科, Dichotumius |
485 | 圆胸甲虫科 |
486 | 皮毛虫科, 圆胸甲虫科 |
487 | 皮毛虫科, 真甲虫科 |
488 | 圆胸甲虫科 |
与上表类似。在我的实际表格中,有包含5或6种不同昆虫的单元格。问题是:
- 如何搜索所有不同的值?我是说,我想创建一个包含所有不同值及其数量的表格,类似于以下表格:
昆虫(科) | 数量 |
---|---|
皮毛虫科 | 54 |
圆胸甲虫科 | 154 |
真甲虫科 | 34 |
(至少有100种不同的昆虫,有些只出现一次,所以无法手动搜索所有不同的名称。)
此外,我考虑将我的表格转换为长格式。类似于以下内容:
而不是这样:
样本编号 | 昆虫(科) |
---|---|
1 | 皮毛虫科 |
1 | 圆胸甲虫科 |
2 | 圆胸甲虫科 |
2 | Dichotumius |
3 | 圆胸甲虫科 |
4 | 皮毛虫科 |
4 | 圆胸甲虫科 |
5 | 真甲虫科 |
5 | Dichotumius |
我认为这种排列比我现在拥有的更好。我希望有人能帮助我解决这个问题。非常感谢。
我尝试了上述方法,但没有成功。这就是我寻求帮助的原因。
英文:
I am working in a science project right now about insects, and I have been logging information about the insects I have been finding along. Right now, I realize that it was a bad decision to register the name of all the insects that I been finding per each observation.
I am not allowed to provide to much information because it is confidential, but I am going to add a similar example of my case in the following table:
# of sample | insect (family) |
---|---|
1 | Dermestidae, Histeridae |
2 | Histeridae, Dichotumius |
3 | Histeriade |
4 | Dermestidae, Histeridae |
5 | Cleridae, Dichotumius |
485 | Histeriade |
486 | Dermestidae, Histeridae |
487 | Dermestidae, Cleridae |
488 | Histeriade |
Something like the above table. In my actual table, I have cells with 5 or 6 diferent insects. The thing is:
- How can I search for all the different values? I mean, I want to create a table that contains all the different values and how many of them are... Something like the following table:
Insect (family) | Count |
---|---|
Cleridae | 54 |
Histeridae | 154 |
Dermestidae | 34 |
(There are at least 100 different insects and some of them just appear once, so it is impossible for me to search all the different names manually.
Furthermore, I was thinking about converting my table to a long structure. Something like the following;
Instead of this:
# of sample | insect (family) |
---|---|
1 | Dermestidae, Histeridae |
2 | Histeridae, Dichotumius |
3 | Histeriade |
4 | Dermestidae, Histeridae |
5 | Cleridae, Dichotumius |
I want this:
# of sample | insect (family) |
---|---|
1 | Dermestidae |
1 | Histeridae |
2 | Histeridae |
2 | Dichotumius |
3 | Histeriade |
4 | Dermestidae |
4 | Histeridae |
5 | Cleridae |
5 | Dichotumius |
I was thinking that this arrangement should be better than the one that I have now.
I hope someone can help me with this issue. Thanks so much.
I tried the above, but I did´t got it. That's the reasons I asking for help.
答案1
得分: 1
你正在尝试完成的任务称为数据逆规整(unpivoting)。Power Query 是这种情况下的最佳选择。如果您希望使用公式来完成,可以尝试以下公式-
=DROP(REDUCE(0,REDUCE(0,B2:B6,LAMBDA(a,x,VSTACK(a,CONCAT(CHOOSEROWS(A2:A6,ROW(x)-1)&"|"&TEXTSPLIT(x,","))))),LAMBDA(p,q,VSTACK(p,TEXTSPLIT(q,"|")))),2)
英文:
What you are trying to accomplish is called unpivoting data. Power query is best for this case. If you want it to do by formula then can try the following formula-
=DROP(REDUCE(0,REDUCE(0,B2:B6,LAMBDA(a,x,VSTACK(a,CONCAT(CHOOSEROWS(A2:A6,ROW(x)-1)&"|")&TEXTSPLIT(x,,",")))),LAMBDA(p,q,VSTACK(p,TEXTSPLIT(q,"|")))),2)
答案2
得分: 1
这可以通过使用Power Query来完成,Power Query在Windows Excel 2010+和Excel 365(Windows或Mac)中可用。
要使用Power Query:
- 在数据表中选择某个单元格
数据 => 获取和转换 => 从表/范围
- 当PQ编辑器打开时:
开始 => 高级编辑器
- 注意第2行的表格名称
- 将下面的M代码粘贴到你看到的位置
- 将第2行的表格名称改回最初生成的名称。
- 阅读注释并探索
应用步骤
以了解算法
//Change next line to reflect your actual data source
Source = Excel.CurrentWorkbook(){[Name="Insects"]}[Content],
//set the data types
#"Changed Type" = Table.TransformColumnTypes(Source,{{"# of sample", Int64.Type}, {"insect (family)", type text}}),
//Split Insect Family column by the comma, into rows
#"Split Column by Delimiter" =
Table.ExpandListColumn(
Table.TransformColumns(#"Changed Type", {{"insect (family)",
Splitter.SplitTextByDelimiter(
",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}),
"insect (family)"),
//Remove any leading and trailing spaces => your unpivoted table
#"Trim Spaces" = Table.TransformColumns(#"Split Column by Delimiter", {"insect (family)", each Text.Trim(_), type text}),
//To create your table with counts, merely Group by the insect (family) column and aggregate with Count
#"Grouped Rows" = Table.Group(#"Trim Spaces", {"insect (family)"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
#"Grouped Rows"
英文:
This can be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
I am uncertain if you want just the unpivoted table, the Counts of each family, or something else, but I have shown the results at each of the last two steps in the query. You can use what you need.
To use Power Query
- Select some cell in your Data Table
Data => Get&Transform => from Table/Range
- When the PQ Editor opens:
Home => Advanced Editor
- Make note of the Table Name in Line 2
- Paste the M Code below in place of what you see
- Change the Table name in line 2 back to what was generated originally.
- Read the comments and explore the
Applied Steps
to understand the algorithm
let
//Change next line to reflect your actual data source
Source = Excel.CurrentWorkbook(){[Name="Insects"]}[Content],
//set the data types
#"Changed Type" = Table.TransformColumnTypes(Source,{{"# of sample", Int64.Type}, {"insect (family)", type text}}),
//Split Insect Family column by the comma, into rows
#"Split Column by Delimiter" =
Table.ExpandListColumn(
Table.TransformColumns(#"Changed Type", {{"insect (family)",
Splitter.SplitTextByDelimiter(
",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}),
"insect (family)"),
//Remove any leading and trailing spaces => your unpivoted table
#"Trim Spaces" = Table.TransformColumns(#"Split Column by Delimiter", {"insect (family)", each Text.Trim(_), type text}),
//To create your table with counts, merely Group by the insect (family) column and aggregate with Count
#"Grouped Rows" = Table.Group(#"Trim Spaces", {"insect (family)"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
#"Grouped Rows"
答案3
得分: 0
回答'我如何搜索所有不同的值?',以下公式将创建昆虫家族的唯一列表(其中昆虫家族在范围B2:B100
中)
=UNIQUE(TEXTSPLIT(TEXTJOIN(",",TRUE,B2:B100),"|",",",TRUE))
然后,您将能够使用COUNTIF()
公式来查找包含每个家族的测试数量。
英文:
To answer 'How can I search for all the different values?', the below formula will create a unique list of the insect families (where the insect families are in range B2:B100
)
=UNIQUE(TEXTSPLIT(TEXTJOIN(", ",TRUE,B2:B100),"|",", ",TRUE))
You will then be able to use a COUNTIF()
formula to find how many tests contain each family.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论