使用wide_to_long与melt重新塑造数据框。

huangapple go评论117阅读模式
英文:

Reshaping dataframe using wide_to_long vs melt

问题

我想重新塑造我的pandas数据框架,最近了解了wide_to_long函数。在什么情况下,您会更喜欢使用这个函数而不是melt函数?

英文:

I want to reshape my pandas dataframe, and recently came across the wide_to_long-function. In what cases would you prefer this function compared to the melt-function?

答案1

得分: 1

wide_to_long通常在你有重复的列名并带有共同后缀时非常有用。该函数会自动提取后缀(你可以用正则表达式指定确切的格式)。

假设有以下示例:

   id  value1  value2  group1  group2
0   1       5       9      13      17
1   2       6      10      14      18
2   3       7      11      15      19
3   4       8      12      16      20

你可以运行:

pd.wide_to_long(df, stubnames=['value', 'group'], i='id', j='type')

输出:

         value  group
id type              
1  1         5     13
2  1         6     14
3  1         7     15
4  1         8     16
1  2         9     17
2  2        10     18
3  2        11     19
4  2        12     20

实际上,文档中描述它为"比melt更灵活但更用户友好”。

melt因此较低级别,你需要对输出进行后处理,从:

df.melt(['id'])

输出:

    id variable  value
0    1   value1      5
1    2   value1      6
2    3   value1      7
3    4   value1      8
4    1   value2      9
5    2   value2     10
6    3   value2     11
7    4   value2     12
8    1   group1     13
9    2   group1     14
10   3   group1     15
11   4   group1     16
12   1   group2     17
13   2   group2     18
14   3   group2     19
15   4   group2     20

到:

tmp = df.melt(['id'])
tmp2 = tmp['variable'].str.extract('(\D+)(\d+)')
(tmp.assign(col=tmp2[0], type=tmp2[1])
    .pivot(index=['id', 'type'], columns='col', values='value')
    .rename_axis(columns=None)
)

输出:

         group  value
id type              
1  1        13      5
   2        17      9
2  1        14      6
   2        18     10
3  1        15      7
   2        19     11
4  1        16      8
   2        20     12

请注意,你也可以使用MultiIndex重塑:

out = (df.set_index('id')
         .pipe(lambda d: d.set_axis(pd.MultiIndex.from_frame(
                                    d.columns.str.extract('(\D+)(\d+)'),
                                    names=[None, 'type']), axis=1))
         .stack()
       )
英文:

wide_to_long is typically useful when you have repeated column names with common suffixes. The function automatically extract the suffix (you can specify the exact format with a regex).

Assuming this example:

   id  value1  value2  group1  group2
0   1       5       9      13      17
1   2       6      10      14      18
2   3       7      11      15      19
3   4       8      12      16      20

You can run:

pd.wide_to_long(df, stubnames=['value', 'group'], i='id', j='type')

Output:

         value  group
id type              
1  1         5     13
2  1         6     14
3  1         7     15
4  1         8     16
1  2         9     17
2  2        10     18
3  2        11     19
4  2        12     20

It's actually described in the documentation to be "Less flexible but more user-friendly than melt.".

melt is thus lower level, you would need to post-process your output, from:

df.melt(['id'])

Output:

    id variable  value
0    1   value1      5
1    2   value1      6
2    3   value1      7
3    4   value1      8
4    1   value2      9
5    2   value2     10
6    3   value2     11
7    4   value2     12
8    1   group1     13
9    2   group1     14
10   3   group1     15
11   4   group1     16
12   1   group2     17
13   2   group2     18
14   3   group2     19
15   4   group2     20

To:

tmp = df.melt(['id'])
tmp2 = tmp['variable'].str.extract('(\D+)(\d+)')
(tmp.assign(col=tmp2[0], type=tmp2[1])
    .pivot(index=['id', 'type'], columns='col', values='value')
    .rename_axis(columns=None)
)

Output:

         group  value
id type              
1  1        13      5
   2        17      9
2  1        14      6
   2        18     10
3  1        15      7
   2        19     11
4  1        16      8
   2        20     12

Note that you could also use MultiIndex reshaping:

out = (df.set_index('id')
         .pipe(lambda d: d.set_axis(pd.MultiIndex.from_frame(
                                    d.columns.str.extract('(\D+)(\d+)'),
                                    names=[None, 'type']), axis=1))
         .stack()
       )

huangapple
  • 本文由 发表于 2023年7月3日 20:23:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76604717.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定