英文:
Flatten pandas dataframe multiindex
问题
我有一个使用以下数据创建的数据框:
data = {
'Date': ['2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00'],
'Distance': ['50', '100', '150', '50', '100', '150'],
'WS': [10, 20, 30, 40, 50, 60],
'DIR': [11, 21, 31, 41, 51, 61]
}
df = pd.DataFrame(data)
print(df)
我想将它展开,使其看起来像这样:
WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_50
Date
2021-06-15 00:10:00 10 11 20 21 30 31
2021-06-15 00:20:00 40 41 50 51 60 61
我尝试使用pivot函数,但是它产生了以下结果:
df['Date'] = pd.to_datetime(df['Date'])
pivot_df = df.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
WS DIR
Distance 100 150 50 100 150 50
Date
2021-06-15 00:10:00 20 30 10 21 31 11
2021-06-15 00:20:00 50 60 40 51 61 41
正如您所看到的,它是按WS和DIR组织的,而我希望它按Distance(50、100、150)组织。
英文:
I have a dataframe created with the following:
data = {
'Date': ['2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00'],
'Distance': ['50', '100', '150', '50', '100', '150'],
'WS': [10, 20, 30, 40, 50, 60],
'DIR': [11, 21, 31, 41, 51, 61]
}
df = pd.DataFrame(data)
print(df)
Date Distance WS DIR
0 2021-06-15T00:10:00 50 10 11
1 2021-06-15T00:10:00 100 20 21
2 2021-06-15T00:10:00 150 30 31
3 2021-06-15T00:20:00 50 40 41
4 2021-06-15T00:20:00 100 50 51
5 2021-06-15T00:20:00 150 60 61
And I want to flatten it so it looks like this:
WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_50
Date
2021-06-15 00:10:00 10 11 20 21 30 31
2021-06-15 00:20:00 40 41 50 51 60 61
I have tried using the pivot function but that produces the following:
df['Date'] = pd.to_datetime(df['Date'])
pivot_df = df.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
WS DIR
Distance 100 150 50 100 150 50
Date
2021-06-15 00:10:00 20 30 10 21 31 11
2021-06-15 00:20:00 50 60 40 51 61 41
As you can see, it's organized by WS and DIR, and I want it organized by Distance (50, 100, 150).
Thanks
答案1
得分: 2
请尝试以下代码:
df_out = df.set_index(['Date', 'Distance'], append=True).unstack()
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out = df_out.groupby('Date').first()
df_out = df_out.sort_index(key=lambda x: df_out.columns.str.split('_').str[1].astype(int), axis=1)
df_out
输出结果:
WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_150
Date
2021-06-15T00:10:00 10.0 11.0 20.0 21.0 30.0 31.0
2021-06-15T00:20:00 40.0 41.0 50.0 51.0 60.0 61.0
英文:
Try this:
df_out = df.set_index(['Date','Distance'], append=True).unstack()
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out = df_out.groupby('Date').first()
df_out = df_out.sort_index(key=lambda x: df_out.columns.str.split('_').str[1].astype(int), axis=1)
df_out
Output:
WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_150
Date
2021-06-15T00:10:00 10.0 11.0 20.0 21.0 30.0 31.0
2021-06-15T00:20:00 40.0 41.0 50.0 51.0 60.0 61.0
答案2
得分: 1
你可以使用整数类型,对索引进行排序,然后展开:
df['Date'] = pd.to_datetime(df['Date'])
pivot_df = (df.astype({'Distance': int})
.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
.sort_index(level='Distance', sort_remaining=False, axis=1)
)
pivot_df.columns = pivot_df.columns.map(lambda x: f'{x[0]}_{x[1]}')
输出结果:
WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_150
Date
2021-06-15 00:10:00 10 11 20 21 30 31
2021-06-15 00:20:00 40 41 50 51 60 61
注意:这段代码是用来处理数据的,具体操作需要在合适的上下文中使用。
英文:
You can use an integer type, sort the index, then flatten:
df['Date'] = pd.to_datetime(df['Date'])
pivot_df = (df.astype({'Distance': int})
.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
.sort_index(level='Distance', sort_remaining=False, axis=1)
)
pivot_df.columns = pivot_df.columns.map(lambda x: f'{x[0]}_{x[1]}')
Output:
WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_150
Date
2021-06-15 00:10:00 10 11 20 21 30 31
2021-06-15 00:20:00 40 41 50 51 60 61
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论