将pandas数据帧多重索引展平

huangapple go评论129阅读模式
英文:

Flatten pandas dataframe multiindex

问题

我有一个使用以下数据创建的数据框:

  1. data = {
  2. 'Date': ['2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00'],
  3. 'Distance': ['50', '100', '150', '50', '100', '150'],
  4. 'WS': [10, 20, 30, 40, 50, 60],
  5. 'DIR': [11, 21, 31, 41, 51, 61]
  6. }
  7. df = pd.DataFrame(data)
  8. print(df)

我想将它展开,使其看起来像这样:

  1. WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_50
  2. Date
  3. 2021-06-15 00:10:00 10 11 20 21 30 31
  4. 2021-06-15 00:20:00 40 41 50 51 60 61

我尝试使用pivot函数,但是它产生了以下结果:

  1. df['Date'] = pd.to_datetime(df['Date'])
  2. pivot_df = df.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
  3. WS DIR
  4. Distance 100 150 50 100 150 50
  5. Date
  6. 2021-06-15 00:10:00 20 30 10 21 31 11
  7. 2021-06-15 00:20:00 50 60 40 51 61 41

正如您所看到的,它是按WS和DIR组织的,而我希望它按Distance(50、100、150)组织。

英文:

I have a dataframe created with the following:

  1. data = {
  2. 'Date': ['2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00'],
  3. 'Distance': ['50', '100', '150', '50', '100', '150'],
  4. 'WS': [10, 20, 30, 40, 50, 60],
  5. 'DIR': [11, 21, 31, 41, 51, 61]
  6. }
  7. df = pd.DataFrame(data)
  8. print(df)
  9. Date Distance WS DIR
  10. 0 2021-06-15T00:10:00 50 10 11
  11. 1 2021-06-15T00:10:00 100 20 21
  12. 2 2021-06-15T00:10:00 150 30 31
  13. 3 2021-06-15T00:20:00 50 40 41
  14. 4 2021-06-15T00:20:00 100 50 51
  15. 5 2021-06-15T00:20:00 150 60 61

And I want to flatten it so it looks like this:

  1. WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_50
  2. Date
  3. 2021-06-15 00:10:00 10 11 20 21 30 31
  4. 2021-06-15 00:20:00 40 41 50 51 60 61

I have tried using the pivot function but that produces the following:

  1. df['Date'] = pd.to_datetime(df['Date'])
  2. pivot_df = df.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
  3. WS DIR
  4. Distance 100 150 50 100 150 50
  5. Date
  6. 2021-06-15 00:10:00 20 30 10 21 31 11
  7. 2021-06-15 00:20:00 50 60 40 51 61 41

As you can see, it's organized by WS and DIR, and I want it organized by Distance (50, 100, 150).

Thanks

答案1

得分: 2

请尝试以下代码:

  1. df_out = df.set_index(['Date', 'Distance'], append=True).unstack()
  2. df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
  3. df_out = df_out.groupby('Date').first()
  4. df_out = df_out.sort_index(key=lambda x: df_out.columns.str.split('_').str[1].astype(int), axis=1)
  5. df_out

输出结果:

  1. WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_150
  2. Date
  3. 2021-06-15T00:10:00 10.0 11.0 20.0 21.0 30.0 31.0
  4. 2021-06-15T00:20:00 40.0 41.0 50.0 51.0 60.0 61.0
英文:

Try this:

  1. df_out = df.set_index(['Date','Distance'], append=True).unstack()
  2. df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
  3. df_out = df_out.groupby('Date').first()
  4. df_out = df_out.sort_index(key=lambda x: df_out.columns.str.split('_').str[1].astype(int), axis=1)
  5. df_out

Output:

  1. WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_150
  2. Date
  3. 2021-06-15T00:10:00 10.0 11.0 20.0 21.0 30.0 31.0
  4. 2021-06-15T00:20:00 40.0 41.0 50.0 51.0 60.0 61.0

答案2

得分: 1

你可以使用整数类型,对索引进行排序,然后展开:

  1. df['Date'] = pd.to_datetime(df['Date'])
  2. pivot_df = (df.astype({'Distance': int})
  3. .pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
  4. .sort_index(level='Distance', sort_remaining=False, axis=1)
  5. )
  6. pivot_df.columns = pivot_df.columns.map(lambda x: f'{x[0]}_{x[1]}')

输出结果:

  1. WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_150
  2. Date
  3. 2021-06-15 00:10:00 10 11 20 21 30 31
  4. 2021-06-15 00:20:00 40 41 50 51 60 61

注意:这段代码是用来处理数据的,具体操作需要在合适的上下文中使用。

英文:

You can use an integer type, sort the index, then flatten:

  1. df['Date'] = pd.to_datetime(df['Date'])
  2. pivot_df = (df.astype({'Distance': int})
  3. .pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
  4. .sort_index(level='Distance', sort_remaining=False, axis=1)
  5. )
  6. pivot_df.columns = pivot_df.columns.map(lambda x: f'{x[0]}_{x[1]}')

Output:

  1. WS_50 DIR_50 WS_100 DIR_100 WS_150 DIR_150
  2. Date
  3. 2021-06-15 00:10:00 10 11 20 21 30 31
  4. 2021-06-15 00:20:00 40 41 50 51 60 61

huangapple
  • 本文由 发表于 2023年8月10日 14:47:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76873220.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定