英文:
How can I transform a DataFrame using pivot and melt in Python Pandas?
问题
我有一个DataFrame:
date price num_floors house
1 2023-01-01 94.30076 3 A
2 2023-01-01 95.58771 2 B
3 2023-01-02 102.78559 1 C
4 2023-01-03 93.29053 3 D
我想将它改变,使得每一列包含给定日期的所有房屋的价格和楼层数。对于一列,该列的前两行是第一栋房屋的数据,接下来两行是第二栋房屋的数据。其余的没有数据的条目用缺失值NaN填充,像这样:
2023-01-01 2023-01-02 2023-01-03
1 94.30076 102.78559 93.29053
2 3 1 3
3 95.58771 NA NA
4 2 NA NA
我在R中成功实现了这个目标:
df_trans <- df %>%
pivot_longer(-date) %>%
mutate(index=row_number(), .by = date) %>%
pivot_wider(id_cols = index, names_from = date, values_from = value) %>%
select(-index)
但是当我尝试使用Python时,它不按我想要的方式工作:
df_trans = df.melt(id_vars='date')
df_trans['n'] = df_trans.groupby('date').cumcount() + 1
df_trans = df_trans.pivot(index='n', columns='date', values='value')
希望这能帮助你解决问题。
英文:
I have a DataFrame
date price num_floors house
1 2023-01-01 94.30076 3 A
2 2023-01-01 95.58771 2 B
3 2023-01-02 102.78559 1 C
4 2023-01-03 93.29053 3 D
and I want to change it, so that each column contains the prices and num_floors for all houses for a given date. For one column, the first two rows of a column refer to the first house, the next two to the second house. The remaining entries without data are filled with the missing value NaN, like this:
2023-01-01 2023-01-02 2023-01-03
1 94.30076 102.78559 93.29053
2 3 1 3
3 95.58771 NA NA
4 2 NA NA
I succeed using R:
df_trans <- df %>%
pivot_longer(-date) %>%
mutate(index=row_number(), .by = date) %>%
pivot_wider(id_cols = index, names_from = date, values_from = value) %>%
select(-index)
but when I try with python, it does not work as I want:
df_trans = df.melt(id_vars='date')
df_trans['n'] = df_trans.groupby('date').cumcount() + 1
df_trans = df_trans.pivot(index='n', columns='date', values='value')
答案1
得分: 1
尝试:
df = df.drop(columns='house')
df['tmp'] = df.groupby('date').cumcount()
df = df.set_index(['date', 'tmp']).stack().unstack('date').reset_index(drop=True)
df.columns.name = None
print(df)
输出:
2023-01-01 2023-01-02 2023-01-03
0 94.30076 102.78559 93.29053
1 3.00000 1.00000 3.00000
2 95.58771 NaN NaN
3 2.00000 NaN NaN
英文:
Try:
df = df.drop(columns='house')
df['tmp'] = df.groupby('date').cumcount()
df = df.set_index(['date', 'tmp']).stack().unstack('date').reset_index(drop=True)
df.columns.name = None
print(df)
Prints:
2023-01-01 2023-01-02 2023-01-03
0 94.30076 102.78559 93.29053
1 3.00000 1.00000 3.00000
2 95.58771 NaN NaN
3 2.00000 NaN NaN
答案2
得分: 0
pd.concat([j.set_index("date")[["price", "num_floors"]].T \
for i, j in df.groupby(df.groupby("date").cumcount())])
英文:
pd.concat([j.set_index("date")[["price", "num_floors"]].T \
for i, j in df.groupby(df.groupby("date").cumcount())])
In sections parts:
- Group by date and cumulative count, and group by this
- For each group, manipulate to desired output
- Concatenate groups
答案3
得分: 0
以下是已翻译的代码部分:
另一种可能的解决方案:
(pd.DataFrame(np.vstack([np.vstack([[x, y], [x, z]]) for x, y, z in
zip(df['date'], df['price'], df['num_floors'])]))
.pivot(columns=0, values=1).rename_axis(None, axis=1)
.apply(lambda x: x.dropna(ignore_index=True)))
或者:
(df.assign(
price = [[x,y] for x,y in zip(df['price'], df['num_floors'])])
.pivot(columns='date', values='price')
.apply(lambda x: x.explode(ignore_index=True))
.rename_axis(None, axis=1)
.apply(lambda x: x.dropna(ignore_index=True)))
输出:
2023-01-01 2023-01-02 2023-01-03
0 94.30076 102.78559 93.29053
1 3 1 3
2 95.58771 NaN NaN
3 2 NaN NaN
英文:
Another possible solution:
(pd.DataFrame(np.vstack([np.vstack([[x, y], [x, z]]) for x, y, z in
zip(df['date'], df['price'], df['num_floors'])]))
.pivot(columns=0, values=1).rename_axis(None, axis=1)
.apply(lambda x: x.dropna(ignore_index=True)))
Alternatively,
(df.assign(
price = [[x,y] for x,y in zip(df['price'], df['num_floors'])])
.pivot(columns='date', values='price')
.apply(lambda x: x.explode(ignore_index=True))
.rename_axis(None, axis=1)
.apply(lambda x: x.dropna(ignore_index=True)))
Output:
2023-01-01 2023-01-02 2023-01-03
0 94.30076 102.78559 93.29053
1 3 1 3
2 95.58771 NaN NaN
3 2 NaN NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论