2023年7月3日 20:23:15go评论117阅读模式

英文:

Reshaping dataframe using wide_to_long vs melt

问题

我想重新塑造我的pandas数据框架，最近了解了wide_to_long函数。在什么情况下，您会更喜欢使用这个函数而不是melt函数？

英文:

I want to reshape my pandas dataframe, and recently came across the wide_to_long-function. In what cases would you prefer this function compared to the melt-function?

答案1

得分: 1

wide_to_long通常在你有重复的列名并带有共同后缀时非常有用。该函数会自动提取后缀（你可以用正则表达式指定确切的格式）。

假设有以下示例：

   id  value1  value2  group1  group2
0   1       5       9      13      17
1   2       6      10      14      18
2   3       7      11      15      19
3   4       8      12      16      20

你可以运行：

pd.wide_to_long(df, stubnames=['value', 'group'], i='id', j='type')

输出：

         value  group
id type              
1  1         5     13
2  1         6     14
3  1         7     15
4  1         8     16
1  2         9     17
2  2        10     18
3  2        11     19
4  2        12     20

实际上，文档中描述它为"比melt更灵活但更用户友好”。

melt因此较低级别，你需要对输出进行后处理，从：

df.melt(['id'])

输出：

    id variable  value
0    1   value1      5
1    2   value1      6
2    3   value1      7
3    4   value1      8
4    1   value2      9
5    2   value2     10
6    3   value2     11
7    4   value2     12
8    1   group1     13
9    2   group1     14
10   3   group1     15
11   4   group1     16
12   1   group2     17
13   2   group2     18
14   3   group2     19
15   4   group2     20

到：

tmp = df.melt(['id'])
tmp2 = tmp['variable'].str.extract('(\D+)(\d+)')
(tmp.assign(col=tmp2[0], type=tmp2[1])
    .pivot(index=['id', 'type'], columns='col', values='value')
    .rename_axis(columns=None)
)

输出：

         group  value
id type              
1  1        13      5
   2        17      9
2  1        14      6
   2        18     10
3  1        15      7
   2        19     11
4  1        16      8
   2        20     12

请注意，你也可以使用MultiIndex重塑：

out = (df.set_index('id')
         .pipe(lambda d: d.set_axis(pd.MultiIndex.from_frame(
                                    d.columns.str.extract('(\D+)(\d+)'),
                                    names=[None, 'type']), axis=1))
         .stack()
       )

英文:

wide_to_long is typically useful when you have repeated column names with common suffixes. The function automatically extract the suffix (you can specify the exact format with a regex).

Assuming this example:

   id  value1  value2  group1  group2
0   1       5       9      13      17
1   2       6      10      14      18
2   3       7      11      15      19
3   4       8      12      16      20

You can run:

pd.wide_to_long(df, stubnames=[&#39;value&#39;, &#39;group&#39;], i=&#39;id&#39;, j=&#39;type&#39;)

Output:

         value  group
id type              
1  1         5     13
2  1         6     14
3  1         7     15
4  1         8     16
1  2         9     17
2  2        10     18
3  2        11     19
4  2        12     20

It's actually described in the documentation to be "Less flexible but more user-friendly than melt.".

melt is thus lower level, you would need to post-process your output, from:

df.melt([&#39;id&#39;])

Output:

    id variable  value
0    1   value1      5
1    2   value1      6
2    3   value1      7
3    4   value1      8
4    1   value2      9
5    2   value2     10
6    3   value2     11
7    4   value2     12
8    1   group1     13
9    2   group1     14
10   3   group1     15
11   4   group1     16
12   1   group2     17
13   2   group2     18
14   3   group2     19
15   4   group2     20

To:

tmp = df.melt([&#39;id&#39;])
tmp2 = tmp[&#39;variable&#39;].str.extract(&#39;(\D+)(\d+)&#39;)
(tmp.assign(col=tmp2[0], type=tmp2[1])
    .pivot(index=[&#39;id&#39;, &#39;type&#39;], columns=&#39;col&#39;, values=&#39;value&#39;)
    .rename_axis(columns=None)
)

Output:

         group  value
id type              
1  1        13      5
   2        17      9
2  1        14      6
   2        18     10
3  1        15      7
   2        19     11
4  1        16      8
   2        20     12

Note that you could also use MultiIndex reshaping:

out = (df.set_index(&#39;id&#39;)
         .pipe(lambda d: d.set_axis(pd.MultiIndex.from_frame(
                                    d.columns.str.extract(&#39;(\D+)(\d+)&#39;),
                                    names=[None, &#39;type&#39;]), axis=1))
         .stack()
       )

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用wide_to_long与melt重新塑造数据框。

问题

答案1

删除Pandas数据帧中基于整行最大值的行。

删除NumPy 2D数组中的一个对象

如何运行一个在pandas-on-spark API中迭代应用正则表达式的函数？

在给定范围内分配空白。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论