使用分组的字符串索引拆分数组

huangapple go评论94阅读模式
英文:

Array_Split with grouped string indices

问题

我有一个数据框,我想在其中创建子数组(即分块),基于索引中的字符串值组。我已经阅读了如何将字符串值列表作为np.array_split中的indices变量传递,但我的情况有点更复杂,我不确定最佳方法。

从下表/数组中,我想要有2个子数组:一个包括索引字符串值"Alpha"和"Bravo",第二个包括值"Charlie"和"Delta"。

示例表格:

索引 列1 列2
Alpha 样本 12
Alpha 样本 13
Alpha 样本 14
Bravo 样本 15
Charlie 样本 16
Charlie 样本 17
Delta 样本 18
Delta 样本 19
Delta 样本 20
Delta 样本 21
英文:

I have a dataframe that I would like to create sub-arrays within (i.e. chunk) based on groups of string values within the index. I've read how you can pass a list of string values as the indices variable in np.array_split, but my scenario is a bit more complicated and I'm unsure on best approach.

From the below table/array, I'd like to have 2 sub-arrays: one array which includes index string values "Alpha" and "Bravo", the second with values "Charlie" and "Delta"

Example table:

Index Column1 Column2
Alpha sample 12
Alpha sample 13
Alpha sample 14
Bravo sample 15
Charlie sample 16
Charlie sample 17
Delta sample 18
Delta sample 19
Delta sample 20
Delta sample 21

答案1

得分: 1

假设有一个DataFrame,并且您想要按自定义分组拆分:

  1. groups = ['Alpha', 'Bravo'], ['Charlie', 'Delta']
  2. dfs = [g for _, g in df.groupby(df['Index'].map({k: v for v, l in enumerate(groups) for k in l}))]

输出:

  1. dfs[0]
  2. Index Column1 Column2
  3. 0 Alpha sample 12
  4. 1 Alpha sample 13
  5. 2 Alpha sample 14
  6. 3 Bravo sample 15
  7. dfs[1]
  8. Index Column1 Column2
  9. 4 Charlie sample 16
  10. 5 Charlie sample 17
  11. 6 Delta sample 18
  12. 7 Delta sample 19
  13. 8 Delta sample 20
  14. 9 Delta sample 21

或者,如果 "Index" 实际上是索引:

  1. groups = ['Alpha', 'Bravo'], ['Charlie', 'Delta']
  2. dfs = [df.loc[l] for l in groups]

输出:

  1. dfs[0]
  2. Column1 Column2
  3. Alpha sample 12
  4. Alpha sample 13
  5. Alpha sample 14
  6. Bravo sample 15
  7. dfs[1]
  8. Column1 Column2
  9. Charlie sample 16
  10. Charlie sample 17
  11. Delta sample 18
  12. Delta sample 19
  13. Delta sample 20
  14. Delta sample 21

最后,如果您没有明确的组合想法,只想要2个值的组(按顺序),那么可以使用:

  1. dfs = [g for _, g in df.groupby(pd.factorize(df['Index'])[0] // 2)]
英文:

Assuming a DataFrame and that you want to split by custom groups:

  1. groups = [['Alpha', 'Bravo'], ['Charlie', 'Delta']]
  2. dfs = [g for _, g in df.groupby(df['Index'].map({k: v for v,l in enumerate(groups) for k in l}))]

Output:

  1. dfs[0]
  2. Index Column1 Column2
  3. 0 Alpha sample 12
  4. 1 Alpha sample 13
  5. 2 Alpha sample 14
  6. 3 Bravo sample 15
  7. dfs[1]
  8. Index Column1 Column2
  9. 4 Charlie sample 16
  10. 5 Charlie sample 17
  11. 6 Delta sample 18
  12. 7 Delta sample 19
  13. 8 Delta sample 20
  14. 9 Delta sample 21

Or, if "Index" is actually the index:

  1. groups = [['Alpha', 'Bravo'], ['Charlie', 'Delta']]
  2. dfs = [df.loc[l] for l in groups]

Output:

  1. dfs[0]
  2. Column1 Column2
  3. Alpha sample 12
  4. Alpha sample 13
  5. Alpha sample 14
  6. Bravo sample 15
  7. dfs[1]
  8. Column1 Column2
  9. Charlie sample 16
  10. Charlie sample 17
  11. Delta sample 18
  12. Delta sample 19
  13. Delta sample 20
  14. Delta sample 21

Finally, if you don't have explicit combinations in mind but just want groups of 2 values (in order), then use:

  1. dfs = [g for _,g in df.groupby(pd.factorize(df['Index'])[0]//2)]

huangapple
  • 本文由 发表于 2023年7月12日 21:23:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76671078.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定