英文:
How can I manipulate this pandas dataframe with time series data in order to be more easier to use?
问题
I understand your request. Here's the translated code portion for manipulating the pandas DataFrame:
# 假设你的数据框名字是 df
# 使用melt函数将时间列转换为行
melted_df = df.melt(id_vars=['Customer', 'Item', 'Date'], var_name='Time', value_name='Value')
# 将时间列和日期列合并
melted_df['DateTime'] = melted_df['Date'] + ' ' + melted_df['Time']
# 删除不再需要的列
melted_df.drop(columns=['Date', 'Time'], inplace=True)
# 使用pivot函数重新排列数据框
final_df = melted_df.pivot(index=['Customer', 'DateTime'], columns='Item', values='Value').reset_index()
final_df.columns.name = None # 删除列名
This code snippet should help you transform your DataFrame as desired.
英文:
I have a pandas dataframe with time series data, where the columns are looking like this:
Customer | Item | Date | 00:00 | 00:30 | 01:00 | ... | 23:30 |
---|---|---|---|---|---|---|---|
XYZ | A | 2020-01-01 | 0 | 1 | 2 | ... | 3 |
XYZ | B | 2020-01-02 | 0 | 2 | 2 | ... | 5 |
ABC | A | 2020-01-01 | 0 | 1 | 5 | ... | 3 |
ABC | B | 2020-01-02 | 0 | 2 | 2 | ... | 1 |
So the hours are in the columns, instead of the rows. I want to manipulate this dataframe, concatenate the time columns into the date column, and make them a separate rows, like this:
Customer | Date | Item A | Item B |
---|---|---|---|
XYZ | 2020-01-01 00:00 | 1 | 2 |
XYZ | 2020-01-01 00:30 | 1 | 2 |
XYZ | 2020-01-01 01:00 | 1 | 2 |
XYZ | 2020-01-02 00:00 | 1 | 2 |
XYZ | 2020-01-02 00:30 | 1 | 2 |
XYZ | 2020-01-02 01:00 | 1 | 2 |
ABC | 2020-01-01 00:00 | 2 | 3 |
ABC | 2020-01-01 00:30 | 2 | 2 |
ABC | 2020-01-01 01:00 | 4 | 2 |
ABC | 2020-01-02 00:00 | 2 | 3 |
ABC | 2020-01-02 00:30 | 2 | 2 |
ABC | 2020-01-02 01:00 | 4 | 2 |
How can I do this? I tried a method using cross join, but that is very uneffective, because I have a lot of rows. (~100000)
答案1
得分: 1
以下是要翻译的内容:
你可以尝试以下操作(其中 df
是你的数据框架):
df["Date"] = pd.to_datetime(df["Date"])
df = (
df.rename(columns={"Item": "物品"})
.melt(id_vars=["Customer", "物品", "Date"], var_name="时间", value_name="物品")
.assign(Date=lambda df: df["Date"] + pd.to_timedelta(df["时间"] + ":00"))
.drop(columns="时间")
.pivot(index=["Customer", "Date"], columns="物品")
.reset_index()
)
df.columns = [a if not b else f"{a} {b}" for a, b in df.columns]
英文:
You could try the following (with df
your dataframe):
df["Date"] = pd.to_datetime(df["Date"])
df = (
df.rename(columns={"Item": "Items"})
.melt(id_vars=["Customer", "Items", "Date"], var_name="Time", value_name="Item")
.assign(Date=lambda df: df["Date"] + pd.to_timedelta(df["Time"] + ":00"))
.drop(columns="Time")
.pivot(index=["Customer", "Date"], columns="Items")
.reset_index()
)
df.columns = [a if not b else f"{a} {b}" for a, b in df.columns]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论