如何操作这个带有时间序列数据的Pandas数据框,以使其更容易使用?

huangapple go评论58阅读模式
英文:

How can I manipulate this pandas dataframe with time series data in order to be more easier to use?

问题

I understand your request. Here's the translated code portion for manipulating the pandas DataFrame:

# 假设你的数据框名字是 df
# 使用melt函数将时间列转换为行
melted_df = df.melt(id_vars=['Customer', 'Item', 'Date'], var_name='Time', value_name='Value')

# 将时间列和日期列合并
melted_df['DateTime'] = melted_df['Date'] + ' ' + melted_df['Time']

# 删除不再需要的列
melted_df.drop(columns=['Date', 'Time'], inplace=True)

# 使用pivot函数重新排列数据框
final_df = melted_df.pivot(index=['Customer', 'DateTime'], columns='Item', values='Value').reset_index()
final_df.columns.name = None  # 删除列名

This code snippet should help you transform your DataFrame as desired.

英文:

I have a pandas dataframe with time series data, where the columns are looking like this:

Customer Item Date 00:00 00:30 01:00 ... 23:30
XYZ A 2020-01-01 0 1 2 ... 3
XYZ B 2020-01-02 0 2 2 ... 5
ABC A 2020-01-01 0 1 5 ... 3
ABC B 2020-01-02 0 2 2 ... 1

So the hours are in the columns, instead of the rows. I want to manipulate this dataframe, concatenate the time columns into the date column, and make them a separate rows, like this:

Customer Date Item A Item B
XYZ 2020-01-01 00:00 1 2
XYZ 2020-01-01 00:30 1 2
XYZ 2020-01-01 01:00 1 2
XYZ 2020-01-02 00:00 1 2
XYZ 2020-01-02 00:30 1 2
XYZ 2020-01-02 01:00 1 2
ABC 2020-01-01 00:00 2 3
ABC 2020-01-01 00:30 2 2
ABC 2020-01-01 01:00 4 2
ABC 2020-01-02 00:00 2 3
ABC 2020-01-02 00:30 2 2
ABC 2020-01-02 01:00 4 2

How can I do this? I tried a method using cross join, but that is very uneffective, because I have a lot of rows. (~100000)

答案1

得分: 1

以下是要翻译的内容:

你可以尝试以下操作(其中 df 是你的数据框架):

df["Date"] = pd.to_datetime(df["Date"])
df = (
    df.rename(columns={"Item": "物品"})
    .melt(id_vars=["Customer", "物品", "Date"], var_name="时间", value_name="物品")
    .assign(Date=lambda df: df["Date"] + pd.to_timedelta(df["时间"] + ":00"))
    .drop(columns="时间")
    .pivot(index=["Customer", "Date"], columns="物品")
    .reset_index()
)
df.columns = [a if not b else f"{a} {b}" for a, b in df.columns]
英文:

You could try the following (with df your dataframe):

df["Date"] = pd.to_datetime(df["Date"])
df = (
    df.rename(columns={"Item": "Items"})
    .melt(id_vars=["Customer", "Items", "Date"], var_name="Time", value_name="Item")
    .assign(Date=lambda df: df["Date"] + pd.to_timedelta(df["Time"] + ":00"))
    .drop(columns="Time")
    .pivot(index=["Customer", "Date"], columns="Items")
    .reset_index()
)
df.columns = [a if not b else f"{a} {b}" for a, b in df.columns]

huangapple
  • 本文由 发表于 2023年4月17日 08:43:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76031017.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定