2023年8月10日 14:47:40go评论137阅读模式

英文:

Flatten pandas dataframe multiindex

问题

我有一个使用以下数据创建的数据框：

data = {
    'Date': ['2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:10:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00', '2021-06-15T00:20:00'],
    'Distance': ['50', '100', '150', '50', '100', '150'],
    'WS': [10, 20, 30, 40, 50, 60],
    'DIR': [11, 21, 31, 41, 51, 61]
}
df = pd.DataFrame(data)
print(df)

我想将它展开，使其看起来像这样：

            WS_50 DIR_50  WS_100 DIR_100 WS_150  DIR_50
Date
2021-06-15 00:10:00  10     11      20      21      30      31
2021-06-15 00:20:00  40     41      50      51      60      61

我尝试使用pivot函数，但是它产生了以下结果：

df['Date'] = pd.to_datetime(df['Date'])
pivot_df = df.pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
                     WS         DIR
Distance            100 150  50 100 150  50
Date
2021-06-15 00:10:00  20  30  10  21  31  11
2021-06-15 00:20:00  50  60  40  51  61  41

正如您所看到的，它是按WS和DIR组织的，而我希望它按Distance（50、100、150）组织。

英文:

I have a dataframe created with the following:

data = {
    &#39;Date&#39;: [&#39;2021-06-15T00:10:00&#39;, &#39;2021-06-15T00:10:00&#39;, &#39;2021-06-15T00:10:00&#39;, &#39;2021-06-15T00:20:00&#39;, &#39;2021-06-15T00:20:00&#39;, &#39;2021-06-15T00:20:00&#39;],
    &#39;Distance&#39;: [&#39;50&#39;, &#39;100&#39;, &#39;150&#39;, &#39;50&#39;, &#39;100&#39;, &#39;150&#39;],
    &#39;WS&#39;: [10, 20, 30, 40, 50, 60],
    &#39;DIR&#39;: [11, 21, 31, 41, 51, 61]
}
df = pd.DataFrame(data)
print(df)
                  Date Distance  WS  DIR
0  2021-06-15T00:10:00       50  10   11
1  2021-06-15T00:10:00      100  20   21
2  2021-06-15T00:10:00      150  30   31
3  2021-06-15T00:20:00       50  40   41
4  2021-06-15T00:20:00      100  50   51
5  2021-06-15T00:20:00      150  60   61

And I want to flatten it so it looks like this:

                    WS_50 DIR_50  WS_100 DIR_100 WS_150  DIR_50
Date
2021-06-15 00:10:00  10     11      20      21      30      31
2021-06-15 00:20:00  40     41      50      51      60      61

I have tried using the pivot function but that produces the following:

df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;])
pivot_df = df.pivot(index=&#39;Date&#39;, columns=&#39;Distance&#39;, values=[&#39;WS&#39;, &#39;DIR&#39;])
                     WS         DIR
Distance            100 150  50 100 150  50
Date
2021-06-15 00:10:00  20  30  10  21  31  11
2021-06-15 00:20:00  50  60  40  51  61  41

As you can see, it's organized by WS and DIR, and I want it organized by Distance (50, 100, 150).

Thanks

答案1

得分: 2

请尝试以下代码：

df_out = df.set_index(['Date', 'Distance'], append=True).unstack()
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out = df_out.groupby('Date').first()
df_out = df_out.sort_index(key=lambda x: df_out.columns.str.split('_').str[1].astype(int), axis=1)
df_out

输出结果：

                         WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15T00:10:00   10.0    11.0    20.0     21.0    30.0     31.0
2021-06-15T00:20:00   40.0    41.0    50.0     51.0    60.0     61.0

英文:

Try this:

df_out = df.set_index([&#39;Date&#39;,&#39;Distance&#39;], append=True).unstack()
df_out.columns = [f&#39;{i}_{j}&#39; for i, j in df_out.columns]
df_out = df_out.groupby(&#39;Date&#39;).first()
df_out = df_out.sort_index(key=lambda x: df_out.columns.str.split(&#39;_&#39;).str[1].astype(int), axis=1)
df_out

Output:

                     WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15T00:10:00   10.0    11.0    20.0     21.0    30.0     31.0
2021-06-15T00:20:00   40.0    41.0    50.0     51.0    60.0     61.0

答案2

得分: 1

你可以使用整数类型，对索引进行排序，然后展开：

df['Date'] = pd.to_datetime(df['Date'])
pivot_df = (df.astype({'Distance': int})
              .pivot(index='Date', columns='Distance', values=['WS', 'DIR'])
              .sort_index(level='Distance', sort_remaining=False, axis=1)
           )
pivot_df.columns = pivot_df.columns.map(lambda x: f'{x[0]}_{x[1]}')

输出结果：

                     WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15 00:10:00     10      11      20       21      30       31
2021-06-15 00:20:00     40      41      50       51      60       61

注意：这段代码是用来处理数据的，具体操作需要在合适的上下文中使用。

英文:

You can use an integer type, sort the index, then flatten:

df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;])
pivot_df = (df.astype({&#39;Distance&#39;: int})
              .pivot(index=&#39;Date&#39;, columns=&#39;Distance&#39;, values=[&#39;WS&#39;, &#39;DIR&#39;])
              .sort_index(level=&#39;Distance&#39;, sort_remaining=False, axis=1)
           )
pivot_df.columns = pivot_df.columns.map(lambda x: f&#39;{x[0]}_{x[1]}&#39;)

Output:

                     WS_50  DIR_50  WS_100  DIR_100  WS_150  DIR_150
Date                                                                
2021-06-15 00:10:00     10      11      20       21      30       31
2021-06-15 00:20:00     40      41      50       51      60       61

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将pandas数据帧多重索引展平

问题

答案1

答案2

为什么当我想在Python和R中查找我的模型的AIC时，会得到不同的结果？

radd 操作与 numpy 数字导致 getitem 循环

无法从Scrapy API获取数据

如何正确在`QRunnable`类中使用`pyqtSlot()`装饰器为函数添加类型提示？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。