2023年7月12日 21:23:37go评论94阅读模式

英文:

Array_Split with grouped string indices

问题

我有一个数据框，我想在其中创建子数组（即分块），基于索引中的字符串值组。我已经阅读了如何将字符串值列表作为np.array_split中的indices变量传递，但我的情况有点更复杂，我不确定最佳方法。

从下表/数组中，我想要有2个子数组：一个包括索引字符串值"Alpha"和"Bravo"，第二个包括值"Charlie"和"Delta"。

示例表格：

索引	列1	列2
Alpha	样本	12
Alpha	样本	13
Alpha	样本	14
Bravo	样本	15
Charlie	样本	16
Charlie	样本	17
Delta	样本	18
Delta	样本	19
Delta	样本	20
Delta	样本	21

英文:

I have a dataframe that I would like to create sub-arrays within (i.e. chunk) based on groups of string values within the index. I've read how you can pass a list of string values as the indices variable in np.array_split, but my scenario is a bit more complicated and I'm unsure on best approach.

From the below table/array, I'd like to have 2 sub-arrays: one array which includes index string values "Alpha" and "Bravo", the second with values "Charlie" and "Delta"

Example table:

Index	Column1	Column2
Alpha	sample	12
Alpha	sample	13
Alpha	sample	14
Bravo	sample	15
Charlie	sample	16
Charlie	sample	17
Delta	sample	18
Delta	sample	19
Delta	sample	20
Delta	sample	21

答案1

得分: 1

假设有一个DataFrame，并且您想要按自定义分组拆分：

groups = ['Alpha', 'Bravo'], ['Charlie', 'Delta']
dfs = [g for _, g in df.groupby(df['Index'].map({k: v for v, l in enumerate(groups) for k in l}))]

输出：

dfs[0]
   Index Column1  Column2
0  Alpha  sample       12
1  Alpha  sample       13
2  Alpha  sample       14
3  Bravo  sample       15
dfs[1]
     Index Column1  Column2
4  Charlie  sample       16
5  Charlie  sample       17
6    Delta  sample       18
7    Delta  sample       19
8    Delta  sample       20
9    Delta  sample       21

或者，如果 "Index" 实际上是索引：

groups = ['Alpha', 'Bravo'], ['Charlie', 'Delta']
dfs = [df.loc[l] for l in groups]

输出：

dfs[0]
      Column1  Column2
Alpha  sample       12
Alpha  sample       13
Alpha  sample       14
Bravo  sample       15
dfs[1]
        Column1  Column2
Charlie  sample       16
Charlie  sample       17
Delta    sample       18
Delta    sample       19
Delta    sample       20
Delta    sample       21

最后，如果您没有明确的组合想法，只想要2个值的组（按顺序），那么可以使用：

dfs = [g for _, g in df.groupby(pd.factorize(df['Index'])[0] // 2)]

英文:

Assuming a DataFrame and that you want to split by custom groups:

groups = [[&#39;Alpha&#39;, &#39;Bravo&#39;], [&#39;Charlie&#39;, &#39;Delta&#39;]]
dfs = [g for _, g in df.groupby(df[&#39;Index&#39;].map({k: v for v,l in enumerate(groups) for k in l}))]

Output:

dfs[0]
   Index Column1  Column2
0  Alpha  sample       12
1  Alpha  sample       13
2  Alpha  sample       14
3  Bravo  sample       15
dfs[1]
     Index Column1  Column2
4  Charlie  sample       16
5  Charlie  sample       17
6    Delta  sample       18
7    Delta  sample       19
8    Delta  sample       20
9    Delta  sample       21

Or, if "Index" is actually the index:

groups = [[&#39;Alpha&#39;, &#39;Bravo&#39;], [&#39;Charlie&#39;, &#39;Delta&#39;]]
dfs = [df.loc[l] for l in groups]

Output:

dfs[0]
      Column1  Column2
Alpha  sample       12
Alpha  sample       13
Alpha  sample       14
Bravo  sample       15
dfs[1]
        Column1  Column2
Charlie  sample       16
Charlie  sample       17
Delta    sample       18
Delta    sample       19
Delta    sample       20
Delta    sample       21

Finally, if you don't have explicit combinations in mind but just want groups of 2 values (in order), then use:

dfs = [g for _,g in df.groupby(pd.factorize(df[&#39;Index&#39;])[0]//2)]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用分组的字符串索引拆分数组

问题

答案1

找到最大的外部时间跨度并追加？

我需要以下句子的解释或示例。

从langchain.chains导入ConversationalRetrievalChain不起作用。

How to convert a conda env yaml file to a list of requirements for a settings.ini file accounting for channels and conversions for pypi

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。