将pandas数据帧多重索引展平

huangapple go评论101阅读模式
英文:

Flatten pandas dataframe multiindex

问题

我有一个使用以下数据创建的数据框:

data = {
    'Date': ['2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00'],
    'Distance': ['50', '100', '150', '50', '100', '150'],
    'WS': [10, 20, 30, 40, 50, 60],
    'DIR': [11, 21, 31, 41, 51, 61]
}

df = pd.DataFrame(data)

print(df)

我想将它展开,使其看起来像这样:

            WS_50 DIR_50  WS_100 DIR_100 WS_150  DIR_50
Date
2021-06-15 00:10:00  10     11      20      21      30      31
2021-06-15 00:20:00  40     41      50      51      60      61

我尝试使用pivot函数,但是它产生了以下结果:

df['Date'] = pd.to_datetime(df['Date'])

pivot_df = df.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])

                     WS         DIR
Distance            100 150  50 100 150  50
Date
2021-06-15 00:10:00  20  30  10  21  31  11
2021-06-15 00:20:00  50  60  40  51  61  41

正如您所看到的,它是按WS和DIR组织的,而我希望它按Distance(50、100、150)组织。

英文:

I have a dataframe created with the following:

data = {
    'Date': ['2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00'],
    'Distance': ['50', '100', '150', '50', '100', '150'],
    'WS': [10, 20, 30, 40, 50, 60],
    'DIR': [11, 21, 31, 41, 51, 61]
}

df = pd.DataFrame(data)

print(df)

                  Date Distance  WS  DIR
0  2021-06-15T00:10:00       50  10   11
1  2021-06-15T00:10:00      100  20   21
2  2021-06-15T00:10:00      150  30   31
3  2021-06-15T00:20:00       50  40   41
4  2021-06-15T00:20:00      100  50   51
5  2021-06-15T00:20:00      150  60   61

And I want to flatten it so it looks like this:

                    WS_50 DIR_50  WS_100 DIR_100 WS_150  DIR_50
Date
2021-06-15 00:10:00  10     11      20      21      30      31
2021-06-15 00:20:00  40     41      50      51      60      61

I have tried using the pivot function but that produces the following:

df['Date'] = pd.to_datetime(df['Date'])

pivot_df = df.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])

                     WS         DIR
Distance            100 150  50 100 150  50
Date
2021-06-15 00:10:00  20  30  10  21  31  11
2021-06-15 00:20:00  50  60  40  51  61  41

As you can see, it's organized by WS and DIR, and I want it organized by Distance (50, 100, 150).

Thanks

答案1

得分: 2

请尝试以下代码:

df_out = df.set_index(['Date', 'Distance'], append=True).unstack()
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out = df_out.groupby('Date').first()
df_out = df_out.sort_index(key=lambda x: df_out.columns.str.split('_').str[1].astype(int), axis=1)
df_out

输出结果:

                         WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15T00:10:00   10.0    11.0    20.0     21.0    30.0     31.0
2021-06-15T00:20:00   40.0    41.0    50.0     51.0    60.0     61.0
英文:

Try this:

df_out = df.set_index(['Date','Distance'], append=True).unstack()
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out = df_out.groupby('Date').first()
df_out = df_out.sort_index(key=lambda x: df_out.columns.str.split('_').str[1].astype(int), axis=1)
df_out

Output:

                     WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15T00:10:00   10.0    11.0    20.0     21.0    30.0     31.0
2021-06-15T00:20:00   40.0    41.0    50.0     51.0    60.0     61.0

答案2

得分: 1

你可以使用整数类型,对索引进行排序,然后展开:

df['Date'] = pd.to_datetime(df['Date'])

pivot_df = (df.astype({'Distance': int})
              .pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
              .sort_index(level='Distance', sort_remaining=False, axis=1)
           )
pivot_df.columns = pivot_df.columns.map(lambda x: f'{x[0]}_{x[1]}')

输出结果:

                     WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15 00:10:00     10      11      20       21      30       31
2021-06-15 00:20:00     40      41      50       51      60       61

注意:这段代码是用来处理数据的,具体操作需要在合适的上下文中使用。

英文:

You can use an integer type, sort the index, then flatten:

df['Date'] = pd.to_datetime(df['Date'])

pivot_df = (df.astype({'Distance': int})
              .pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
              .sort_index(level='Distance', sort_remaining=False, axis=1)
           )
pivot_df.columns = pivot_df.columns.map(lambda x: f'{x[0]}_{x[1]}')

Output:

                     WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15 00:10:00     10      11      20       21      30       31
2021-06-15 00:20:00     40      41      50       51      60       61

huangapple
  • 本文由 发表于 2023年8月10日 14:47:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76873220.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定