Is there a way to reshape a single index pandas DataFrame into a multi index to adapt to time series?

huangapple go评论103阅读模式
英文:

Is there a way to reshape a single index pandas DataFrame into a multi index to adapt to time series?

问题

以下是一个示例数据框:

  1. import pandas as pd
  2. sample_dframe = pd.DataFrame.from_dict(
  3. {
  4. "id": [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
  5. "V1": [2552, 813, 496, 401, 4078, 952, 7279, 544, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4078952, 7279544],
  6. "V2": [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
  7. "V3": [382, 161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
  8. "V4": [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263, 642, 255, 2813, 4088920, 6323521]
  9. }
  10. )

数据框如下所示:

Is there a way to reshape a single index pandas DataFrame into a multi index to adapt to time series?

以上示例的形状是(22, 5),包括列idV1V4。我需要将其转换为一个多索引数据框(时间序列),其中对于给定的id,我需要将来自V1V4的每个id的5个值(时间步长)进行分组。

换句话说,它应该给我一个形状为(2, 4, 5)的数据框,因为有2个唯一的id值。

英文:

Here's a sample data frame:

  1. import pandas as pd
  2. sample_dframe = pd.DataFrame.from_dict(
  3. {
  4. "id": [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
  5. "V1": [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
  6. "V2": [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
  7. "V3": [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
  8. "V4": [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
  9. }
  10. )

The data frame looks like this:

Is there a way to reshape a single index pandas DataFrame into a multi index to adapt to time series?

The above sample shape is (22, 5) and has columns id, V1..V4. I need to convert this into a multi index data frame (as a time series), where for a given id, I need to group 5 values (time steps) from each of V1..V4 for a given id.

i.e., it should give me a frame of shape (2, 4, 5) since there are 2 unique id values.

答案1

得分: 2

IIUC, you might just want:

  1. sample_dframe.set_index('id').stack()

NB. the output is a Series, for a DataFrame add .to_frame(name='col_name').

Output:

  1. id
  2. 123 V1 2552
  3. V2 3434
  4. V3 382
  5. V4 32628
  6. V1 813
  7. ...
  8. 456 V4 4088920
  9. V1 7279544
  10. V2 4453
  11. V3 58830
  12. V4 6323521
  13. Length: 88, dtype: int64

Or, maybe:

  1. (sample_dframe
  2. .assign(time=lambda d: d.groupby('id').cumcount())
  3. .set_index(['id', 'time']).stack()
  4. .swaplevel('time', -1)
  5. )

Output:

  1. id time
  2. 123 V1 0 2552
  3. V2 0 3434
  4. V3 0 382
  5. V4 0 32628
  6. V1 1 813
  7. ...
  8. 456 V4 10 4088920
  9. V1 11 7279544
  10. V2 11 4453
  11. V3 11 58830
  12. V4 11 6323521
  13. Length: 88, dtype: int64
  14. <details>
  15. <summary>英文:</summary>
  16. IIUC, you might just want:

sample_dframe.set_index('id').stack()

  1. *NB. the output is a Series, for a DataFrame add `.to_frame(name=&#39;col_name&#39;)`.*
  2. Output:

id
123 V1 2552
V2 3434
V3 382
V4 32628
V1 813
...
456 V4 4088920
V1 7279544
V2 4453
V3 58830
V4 6323521
Length: 88, dtype: int64

  1. Or, maybe:

(sample_dframe
.assign(time=lambda d: d.groupby('id').cumcount())
.set_index(['id', 'time']).stack()
.swaplevel('time', -1)
)

  1. Output:

id time
123 V1 0 2552
V2 0 3434
V3 0 382
V4 0 32628
V1 1 813
...
456 V4 10 4088920
V1 11 7279544
V2 11 4453
V3 11 58830
V4 11 6323521
Length: 88, dtype: int64

  1. </details>
  2. # 答案2
  3. **得分**: 0
  4. ```python
  5. import itertools
  6. import timeit
  7. from pandas import DataFrame
  8. import numpy as np
  9. import pandas as pd
  10. from datetime import datetime
  11. from pandas import DataFrame
  12. import functools as ft
  13. df= pd.DataFrame.from_dict(
  14. {
  15. "id": [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
  16. "V1": [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
  17. "V2": [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
  18. "V3": [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
  19. "V4": [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
  20. }
  21. )
  22. print(df)
  23. """
  24. id V1 V2 V3 V4
  25. 0 123 2552 3434 382 32628
  26. 1 123 813 133 161 4471
  27. 2 123 496 424 7237 4781
  28. 3 123 401 491 7503 1497
  29. 4 123 4078 8217 561 45104
  30. 5 123 952 915 6801 8657
  31. 6 123 7279 7179 1072 81074
  32. 7 123 544 5414 9660 1091
  33. 8 123 450 450 62107 370835
  34. 9 123 548 548 6233 2058
  35. 10 456 433 433 5403 4447
  36. 11 456 4696 4696 3745 7376
  37. 12 456 244 244 8613 302237
  38. 13 456 9735 9735 6302 6833
  39. 14 456 4263 4263 557 48348
  40. 15 456 642 642 4256 3545
  41. 16 456 255 255 9874 4263
  42. 17 456 2813 2813 3013 642
  43. 18 456 496 496 9352 255
  44. 19 456 401 401 4522 2813
  45. 20 456 4078952 4952 3232 4088920
  46. 21 456 7279544 4453 58830 6323521
  47. """
  48. df = df.set_index('id').stack().reset_index().drop(columns = 'level_1').rename(columns = {0:'V1_new'})
  49. print(df)
  50. """
  51. id V1_new
  52. 0 123 2552
  53. 1 123 3434
  54. 2 123 382
  55. 3 123 32628
  56. 4 123 813
  57. .. ... ...
  58. 83 456 4088920
  59. 84 456 7279544
  60. 85 456 4453
  61. 86 456 58830
  62. 87 456 6323521
  63. """
英文:
  1. import itertools
  2. import timeit
  3. from pandas import DataFrame
  4. import numpy as np
  5. import pandas as pd
  6. from datetime import datetime
  7. from pandas import DataFrame
  8. import functools as ft
  9. df= pd.DataFrame.from_dict(
  10. {
  11. &quot;id&quot;: [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
  12. &quot;V1&quot;: [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
  13. &quot;V2&quot;: [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
  14. &quot;V3&quot;: [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
  15. &quot;V4&quot;: [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
  16. }
  17. )
  18. print(df)
  19. &quot;&quot;&quot;
  20. id V1 V2 V3 V4
  21. 0 123 2552 3434 382 32628
  22. 1 123 813 133 161 4471
  23. 2 123 496 424 7237 4781
  24. 3 123 401 491 7503 1497
  25. 4 123 4078 8217 561 45104
  26. 5 123 952 915 6801 8657
  27. 6 123 7279 7179 1072 81074
  28. 7 123 544 5414 9660 1091
  29. 8 123 450 450 62107 370835
  30. 9 123 548 548 6233 2058
  31. 10 456 433 433 5403 4447
  32. 11 456 4696 4696 3745 7376
  33. 12 456 244 244 8613 302237
  34. 13 456 9735 9735 6302 6833
  35. 14 456 4263 4263 557 48348
  36. 15 456 642 642 4256 3545
  37. 16 456 255 255 9874 4263
  38. 17 456 2813 2813 3013 642
  39. 18 456 496 496 9352 255
  40. 19 456 401 401 4522 2813
  41. 20 456 4078952 4952 3232 4088920
  42. 21 456 7279544 4453 58830 6323521
  43. &quot;&quot;&quot;
  44. df = df.set_index(&#39;id&#39;).stack().reset_index().drop(columns = &#39;level_1&#39;).rename(columns = {0:&#39;V1_new&#39;})
  45. print(df)
  46. &quot;&quot;&quot;
  47. id V1_new
  48. 0 123 2552
  49. 1 123 3434
  50. 2 123 382
  51. 3 123 32628
  52. 4 123 813
  53. .. ... ...
  54. 83 456 4088920
  55. 84 456 7279544
  56. 85 456 4453
  57. 86 456 58830
  58. 87 456 6323521
  59. &quot;&quot;&quot;

huangapple
  • 本文由 发表于 2023年2月8日 12:16:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381328.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定