英文:
Reshaping dataframe using wide_to_long vs melt
问题
我想重新塑造我的pandas数据框架,最近了解了wide_to_long
函数。在什么情况下,您会更喜欢使用这个函数而不是melt
函数?
英文:
I want to reshape my pandas dataframe, and recently came across the wide_to_long
-function. In what cases would you prefer this function compared to the melt
-function?
答案1
得分: 1
wide_to_long
通常在你有重复的列名并带有共同后缀时非常有用。该函数会自动提取后缀(你可以用正则表达式指定确切的格式)。
假设有以下示例:
id value1 value2 group1 group2
0 1 5 9 13 17
1 2 6 10 14 18
2 3 7 11 15 19
3 4 8 12 16 20
你可以运行:
pd.wide_to_long(df, stubnames=['value', 'group'], i='id', j='type')
输出:
value group
id type
1 1 5 13
2 1 6 14
3 1 7 15
4 1 8 16
1 2 9 17
2 2 10 18
3 2 11 19
4 2 12 20
实际上,文档中描述它为"比melt更灵活但更用户友好”。
melt
因此较低级别,你需要对输出进行后处理,从:
df.melt(['id'])
输出:
id variable value
0 1 value1 5
1 2 value1 6
2 3 value1 7
3 4 value1 8
4 1 value2 9
5 2 value2 10
6 3 value2 11
7 4 value2 12
8 1 group1 13
9 2 group1 14
10 3 group1 15
11 4 group1 16
12 1 group2 17
13 2 group2 18
14 3 group2 19
15 4 group2 20
到:
tmp = df.melt(['id'])
tmp2 = tmp['variable'].str.extract('(\D+)(\d+)')
(tmp.assign(col=tmp2[0], type=tmp2[1])
.pivot(index=['id', 'type'], columns='col', values='value')
.rename_axis(columns=None)
)
输出:
group value
id type
1 1 13 5
2 17 9
2 1 14 6
2 18 10
3 1 15 7
2 19 11
4 1 16 8
2 20 12
请注意,你也可以使用MultiIndex重塑:
out = (df.set_index('id')
.pipe(lambda d: d.set_axis(pd.MultiIndex.from_frame(
d.columns.str.extract('(\D+)(\d+)'),
names=[None, 'type']), axis=1))
.stack()
)
英文:
wide_to_long
is typically useful when you have repeated column names with common suffixes. The function automatically extract the suffix (you can specify the exact format with a regex).
Assuming this example:
id value1 value2 group1 group2
0 1 5 9 13 17
1 2 6 10 14 18
2 3 7 11 15 19
3 4 8 12 16 20
You can run:
pd.wide_to_long(df, stubnames=['value', 'group'], i='id', j='type')
Output:
value group
id type
1 1 5 13
2 1 6 14
3 1 7 15
4 1 8 16
1 2 9 17
2 2 10 18
3 2 11 19
4 2 12 20
It's actually described in the documentation to be "Less flexible but more user-friendly than melt.".
melt
is thus lower level, you would need to post-process your output, from:
df.melt(['id'])
Output:
id variable value
0 1 value1 5
1 2 value1 6
2 3 value1 7
3 4 value1 8
4 1 value2 9
5 2 value2 10
6 3 value2 11
7 4 value2 12
8 1 group1 13
9 2 group1 14
10 3 group1 15
11 4 group1 16
12 1 group2 17
13 2 group2 18
14 3 group2 19
15 4 group2 20
To:
tmp = df.melt(['id'])
tmp2 = tmp['variable'].str.extract('(\D+)(\d+)')
(tmp.assign(col=tmp2[0], type=tmp2[1])
.pivot(index=['id', 'type'], columns='col', values='value')
.rename_axis(columns=None)
)
Output:
group value
id type
1 1 13 5
2 17 9
2 1 14 6
2 18 10
3 1 15 7
2 19 11
4 1 16 8
2 20 12
Note that you could also use MultiIndex reshaping:
out = (df.set_index('id')
.pipe(lambda d: d.set_axis(pd.MultiIndex.from_frame(
d.columns.str.extract('(\D+)(\d+)'),
names=[None, 'type']), axis=1))
.stack()
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论