2023年2月8日 12:16:00go评论103阅读模式

英文:

Is there a way to reshape a single index pandas DataFrame into a multi index to adapt to time series?

问题

以下是一个示例数据框：

import pandas as pd
sample_dframe = pd.DataFrame.from_dict(
    {
        "id": [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
        "V1": [2552, 813, 496, 401, 4078, 952, 7279, 544, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4078952, 7279544],
        "V2": [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
        "V3": [382, 161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
        "V4": [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263, 642, 255, 2813, 4088920, 6323521]
    }
)

数据框如下所示：

以上示例的形状是(22, 5)，包括列id、V1到V4。我需要将其转换为一个多索引数据框（时间序列），其中对于给定的id，我需要将来自V1到V4的每个id的5个值（时间步长）进行分组。

换句话说，它应该给我一个形状为(2, 4, 5)的数据框，因为有2个唯一的id值。

英文:

Here's a sample data frame:

import pandas as pd
sample_dframe = pd.DataFrame.from_dict(
    {
        &quot;id&quot;: [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
        &quot;V1&quot;: [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
        &quot;V2&quot;: [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
        &quot;V3&quot;: [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
        &quot;V4&quot;: [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
    }
)

The data frame looks like this:

The above sample shape is (22, 5) and has columns id, V1..V4. I need to convert this into a multi index data frame (as a time series), where for a given id, I need to group 5 values (time steps) from each of V1..V4 for a given id.

i.e., it should give me a frame of shape (2, 4, 5) since there are 2 unique id values.

答案1

得分: 2

IIUC, you might just want:

sample_dframe.set_index('id').stack()

NB. the output is a Series, for a DataFrame add .to_frame(name='col_name').

Output:

id     
123  V1       2552
     V2       3434
     V3        382
     V4      32628
     V1        813
            ...   
456  V4    4088920
     V1    7279544
     V2       4453
     V3      58830
     V4    6323521
Length: 88, dtype: int64

Or, maybe:

(sample_dframe
 .assign(time=lambda d: d.groupby('id').cumcount())
 .set_index(['id', 'time']).stack()
 .swaplevel('time', -1)
 )

Output:

id       time
123  V1  0          2552
     V2  0          3434
     V3  0           382
     V4  0         32628
     V1  1           813
                  ...   
456  V4  10      4088920
     V1  11      7279544
     V2  11         4453
     V3  11        58830
     V4  11      6323521
Length: 88, dtype: int64
<details>
<summary>英文:</summary>
IIUC, you might just want:

sample_dframe.set_index('id').stack()

*NB. the output is a Series, for a DataFrame add `.to_frame(name=&#39;col_name&#39;)`.*
Output:

id
123 V1 2552
V2 3434
V3 382
V4 32628
V1 813
...
456 V4 4088920
V1 7279544
V2 4453
V3 58830
V4 6323521
Length: 88, dtype: int64

Or, maybe:

(sample_dframe
.assign(time=lambda d: d.groupby('id').cumcount())
.set_index(['id', 'time']).stack()
.swaplevel('time', -1)
)

Output:

id time
123 V1 0 2552
V2 0 3434
V3 0 382
V4 0 32628
V1 1 813
...
456 V4 10 4088920
V1 11 7279544
V2 11 4453
V3 11 58830
V4 11 6323521
Length: 88, dtype: int64


</details>
# 答案2
**得分**: 0
```python
    import itertools
    import timeit
    from pandas import DataFrame
    import numpy as np
    import pandas as pd
    from datetime import datetime
    from pandas import DataFrame
    import functools as ft
    df= pd.DataFrame.from_dict(
        {
            "id": [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
            "V1": [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
            "V2": [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
            "V3": [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
            "V4": [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
        }
    )
    print(df)
    """
         id       V1    V2     V3       V4
    0   123     2552  3434    382    32628
    1   123      813   133    161     4471
    2   123      496   424   7237     4781
    3   123      401   491   7503     1497
    4   123     4078  8217    561    45104
    5   123      952   915   6801     8657
    6   123     7279  7179   1072    81074
    7   123      544  5414   9660     1091
    8   123      450   450  62107   370835
    9   123      548   548   6233     2058
    10  456      433   433   5403     4447
    11  456     4696  4696   3745     7376
    12  456      244   244   8613   302237
    13  456     9735  9735   6302     6833
    14  456     4263  4263    557    48348
    15  456      642   642   4256     3545
    16  456      255   255   9874     4263
    17  456     2813  2813   3013      642
    18  456      496   496   9352      255
    19  456      401   401   4522     2813
    20  456  4078952  4952   3232  4088920
    21  456  7279544  4453  58830  6323521
    """
    df = df.set_index('id').stack().reset_index().drop(columns = 'level_1').rename(columns = {0:'V1_new'})
    print(df)
    """
        id   V1_new
    0   123     2552
    1   123     3434
    2   123      382
    3   123    32628
    4   123      813
    ..  ...      ...
    83  456  4088920
    84  456  7279544
    85  456     4453
    86  456    58830
    87  456  6323521
    """

英文:

import itertools
import timeit
from pandas import DataFrame
import numpy as np
import pandas as pd
from datetime import datetime
from pandas import DataFrame
import functools as ft
df= pd.DataFrame.from_dict(
    {
        &quot;id&quot;: [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
        &quot;V1&quot;: [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
        &quot;V2&quot;: [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
        &quot;V3&quot;: [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
        &quot;V4&quot;: [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
    }
)
print(df)
&quot;&quot;&quot;
     id       V1    V2     V3       V4
0   123     2552  3434    382    32628
1   123      813   133    161     4471
2   123      496   424   7237     4781
3   123      401   491   7503     1497
4   123     4078  8217    561    45104
5   123      952   915   6801     8657
6   123     7279  7179   1072    81074
7   123      544  5414   9660     1091
8   123      450   450  62107   370835
9   123      548   548   6233     2058
10  456      433   433   5403     4447
11  456     4696  4696   3745     7376
12  456      244   244   8613   302237
13  456     9735  9735   6302     6833
14  456     4263  4263    557    48348
15  456      642   642   4256     3545
16  456      255   255   9874     4263
17  456     2813  2813   3013      642
18  456      496   496   9352      255
19  456      401   401   4522     2813
20  456  4078952  4952   3232  4088920
21  456  7279544  4453  58830  6323521
&quot;&quot;&quot;
df = df.set_index(&#39;id&#39;).stack().reset_index().drop(columns = &#39;level_1&#39;).rename(columns = {0:&#39;V1_new&#39;})
print(df)
&quot;&quot;&quot;
    id   V1_new
0   123     2552
1   123     3434
2   123      382
3   123    32628
4   123      813
..  ...      ...
83  456  4088920
84  456  7279544
85  456     4453
86  456    58830
87  456  6323521
&quot;&quot;&quot;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Is there a way to reshape a single index pandas DataFrame into a multi index to adapt to time series?

问题

答案1

电子邮件 Outlook 获取正文

正在使用VSCode时遇到文件目录错误。

搜索字符串，使用列表元素进行匹配，返回找到的匹配项。

Find all the texts which is 'Normal' style and font size is NOT 11 in a docx file using python-docx

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。