如何用常规数据框中的值替换多级索引数据框中的第二级?

huangapple go评论78阅读模式
英文:

How to replace level 2 in a multiIndex dataframe with values from a regular dataframe?

问题

我正在编写一个用于处理工作中客户数据分析的算法,我正在按用户ID作为我的第一级,然后按月份作为我的第二级来对数据进行分组。

然后,我对数据进行逐个用户的处理,其中包括一个月度时间序列。

我下面的代码完全做到了我需要的,除了最后一步。

如何将原始数据中的数据与输出中的处理后时间序列数据合并?

    tempDF1 = pd.read_csv('data.csv', index_col=[0,1], parse_dates=[1], thousands=',')
    
    tempDF1["Average"] = 0
    tempDF1["Score"] = 0
    
    for id, df in tempDF1.groupby(level=0):
        for date in df.loc[id].index:
            df.loc[(id,date),"Average"] = df.loc[(id,date)].Purchased/df.loc[(id,date)].Count
            df.loc[(id,date),"Score"] = df.loc[(id,date)].Count/10*log(df.loc[(id,date)].Average, 10)
        try:
            list(df.loc[(id,"2021-01-01",),:])
        except:
            df.loc[(id, "2021-01-01",),:] = 0
        try:
            list(df.loc[(id,"2023-06-01",),:])
        except:
            df.loc[(id, "2023-06-01",),:] = 0

        resampleDF = df.loc[id].resample('M', closed="left").mean().fillna(0)
        print(resampleDF)
        tempDF1.loc[id].update(resampleDF)
        print(tempDF1)

以下是上述代码的示例输出。resampleDF 是用于 ID 1785 的,我正在尝试替换整个第二级索引,列中的数据,以及所有其他内容。我需要将 resampleDF 放入 tempDF1 中,替换掉 1785(以及其他的,但应该由上面的循环完成)。

resampleDF

月份 数量 购买 平均值 分数
2021-01-31 0.0 0.0 0.0 0.000000
2021-02-28 0.0 0.0 0.0 0.000000
2021-03-31 0.0 0.0 0.0 0.000000
2021-04-30 0.0 0.0 0.0 0.000000
2021-05-31 0.0 0.0 0.0 0.000000
2021-06-30 1.0 9999.0 9999.0 0.399996
2021-07-31 0.0 0.0 0.0 0.000000
2021-08-31 0.0 0.0 0.0 0.000000
2021-09-30 0.0 0.0 0.0 0.000000
2021-10-31 2.0 20.0 10.0 0.200000
2021-11-30 0.0 0.0 0.0 0.000000

tempDF1

用户ID 月份 数量 购买 平均值 分数
1785 2021-06-01 1 9999 9999.0 0.399996
2021-10-01 2 20 10.0 0.200000
1857 2021-06-01 1 100 100.0 0.333333
1953 2021-03-01 5 250 50.0 0.398639
2023 2021-09-01 5 1100 220.0 0.599340
... ... ... ... ...
18149 2023-03-01 1 300 300.0 0.477121
18169 2023-03-01 2 40 20.0 0.301030
18177 2023-03-01 2 1100 550.0 0.740363
18210 2023-03-01 1 10 10.0 0.301030
18215 2023-03-01 2 110 55.0 0.397940
英文:

I am writing an algorithm for processing customers at work for some data analysis, and I am grouping the data by user ID as my level 1, then by month for my level 2.

I am then doing some processing work on the data user by user, which involves a monthly time series.

The code I have below does exactly what I need it to do, save for the final step.

How can I overload the data in my original with the processed time series data in the output?

    tempDF1 = pd.read_csv('data.csv', index_col=[0,1], parse_dates=[1], thousands=',')
    
    tempDF1["Average"] = 0
    tempDF1["Score"] = 0
    
    for id, df in tempDF1.groupby(level=0):
        for date in df.loc[id].index:
            df.loc[(id,date),"Average"] = df.loc[(id,date)].Purchased/df.loc[(id,date)].Count
            df.loc[(id,date),"Score"] = df.loc[(id,date)].Count/10*log(df.loc[(id,date)].Average, 10)
        try:
            list(df.loc[(id,"2021-01-01",),:])
        except:
            df.loc[(id, "2021-01-01",),:] = 0
        try:
            list(df.loc[(id,"2023-06-01",),:])
        except:
            df.loc[(id, "2023-06-01",),:] = 0

        resampleDF = df.loc[id].resample('M', closed="left").mean().fillna(0)
        print(resampleDF)
        tempDF1.loc[id].replace(resampleDF, inplace=True)
        print(tempDF1)

Below is the sample output from the code above. The resampleDF is for ID 1785 and I am trying to replace the whole level; level 2 indexing, the data in the columns, everything wholesale. I need to put resampleDF into tempDF1 in place of what is there for 1785 (and the rest, but that should be accomplished by the above loop.

resampleDF

Month Count Purchased Average Score
2021-01-31 0.0 0.0 0.0 0.000000
2021-02-28 0.0 0.0 0.0 0.000000
2021-03-31 0.0 0.0 0.0 0.000000
2021-04-30 0.0 0.0 0.0 0.000000
2021-05-31 0.0 0.0 0.0 0.000000
2021-06-30 1.0 9999.0 9999.0 0.399996
2021-07-31 0.0 0.0 0.0 0.000000
2021-08-31 0.0 0.0 0.0 0.000000
2021-09-30 0.0 0.0 0.0 0.000000
2021-10-31 2.0 20.0 10.0 0.200000
2021-11-30 0.0 0.0 0.0 0.000000

tempDF1

user_id Month Count Purchased Average Score
1785 2021-06-01 1 9999 0 0
2021-10-01 2 20 0 0
1857 2021-06-01 1 100 0 0
1953 2021-03-01 5 250 0 0
2023 2021-09-01 5 1100 0 0
... ... ... ... ...
18149 2023-03-01 1 300 0 0
18169 2023-03-01 2 40 0 0
18177 2023-03-01 2 1100 0 0
18210 2023-03-01 1 10 0 0
18215 2023-03-01 2 110 0 0

答案1

得分: 1

以下是翻译好的代码部分:

import pandas as pd

tempDF1 = pd.DataFrame(
    {
        "user_id": [1785, 1785, 1857],
        "Month": ["2021-06-01", "2021-10-01", "2021-06-01"],
        "Count": [1, 2, 1],
        "Purchased": [9999, 20, 100],
        "Average": [0, 0, 0],
        "Score": [0, 0, 0],
    },
).set_index(["user_id", "Month"])

print(tempDF1)
# 输出

                    Count  Purchased  Average  Score
user_id Month
1785    2021-06-01      1       9999        0      0
        2021-10-01      2         20        0      0
1857    2021-06-01      1        100        0      0

resampleDF = pd.DataFrame(
    {
        "Month": [
            "2021-01-31",
            "2021-02-28",
            "2021-03-31",
            "2021-04-30",
            "2021-05-31",
            "2021-06-30",
            "2021-07-31",
            "2021-08-31",
            "2021-09-30",
            "2021-10-31",
            "2021-11-30",
        ],
        "Count": [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0],
        "Purchased": [0.0, 0.0, 0.0, 0.0, 0.0, 9999.0, 0.0, 0.0, 0.0, 20.0, 0.0],
        "Average": [0.0, 0.0, 0.0, 0.0, 0.0, 9999.0, 0.0, 0.0, 0.0, 10.0, 0.0],
        "Score": [0.0, 0.0, 0.0, 0.0, 0.0, 0.399996, 0.0, 0.0, 0.0, 0.2, 0.0],
    }
)

以下是另一种使用 Pandas 的 dropassignconcat 方法的代码:

tempDF1 = pd.concat(
    [
        resampleDF.assign(user_id=1785).set_index(["user_id", "Month"]),
        tempDF1.drop(index=1785),
    ]
)

print(tempDF1)
# 输出

                    Count  Purchased  Average     Score
user_id Month
1785    2021-01-31    0.0        0.0      0.0  0.000000
        2021-02-28    0.0        0.0      0.0  0.000000
        2021-03-31    0.0        0.0      0.0  0.000000
        2021-04-30    0.0        0.0      0.0  0.000000
        2021-05-31    0.0        0.0      0.0  0.000000
        2021-06-30    1.0     9999.0   9999.0  0.399996
        2021-07-31    0.0        0.0      0.0  0.000000
        2021-08-31    0.0        0.0      0.0  0.000000
        2021-09-30    0.0        0.0      0.0  0.000000
        2021-10-31    2.0       20.0     10.0  0.200000
        2021-11-30    0.0        0.0      0.0  0.000000
1857    2021-06-01    1.0      100.0      0.0  0.000000
英文:

With the dataframes you provided:

import pandas as pd

tempDF1 = pd.DataFrame(
    {
        "user_id": [1785, 1785, 1857],
        "Month": ["2021-06-01", "2021-10-01", "2021-06-01"],
        "Count": [1, 2, 1],
        "Purchased": [9999, 20, 100],
        "Average": [0, 0, 0],
        "Score": [0, 0, 0],
    },
).set_index(["user_id", "Month"])

print(tempDF1)
# Output

                    Count  Purchased  Average  Score
user_id Month
1785    2021-06-01      1       9999        0      0
        2021-10-01      2         20        0      0
1857    2021-06-01      1        100        0      0
resampleDF = pd.DataFrame(
    {
        "Month": [
            "2021-01-31",
            "2021-02-28",
            "2021-03-31",
            "2021-04-30",
            "2021-05-31",
            "2021-06-30",
            "2021-07-31",
            "2021-08-31",
            "2021-09-30",
            "2021-10-31",
            "2021-11-30",
        ],
        "Count": [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0],
        "Purchased": [0.0, 0.0, 0.0, 0.0, 0.0, 9999.0, 0.0, 0.0, 0.0, 20.0, 0.0],
        "Average": [0.0, 0.0, 0.0, 0.0, 0.0, 9999.0, 0.0, 0.0, 0.0, 10.0, 0.0],
        "Score": [0.0, 0.0, 0.0, 0.0, 0.0, 0.399996, 0.0, 0.0, 0.0, 0.2, 0.0],
    }
)

Here is one way to do it with Pandas drop, assign, and concat:

tempDF1 = pd.concat(
    [
        resampleDF.assign(user_id=1785).set_index(["user_id", "Month"]),
        tempDF1.drop(index=1785),
    ]
)

print(tempDF1)
# Output

                    Count  Purchased  Average     Score
user_id Month
1785    2021-01-31    0.0        0.0      0.0  0.000000
        2021-02-28    0.0        0.0      0.0  0.000000
        2021-03-31    0.0        0.0      0.0  0.000000
        2021-04-30    0.0        0.0      0.0  0.000000
        2021-05-31    0.0        0.0      0.0  0.000000
        2021-06-30    1.0     9999.0   9999.0  0.399996
        2021-07-31    0.0        0.0      0.0  0.000000
        2021-08-31    0.0        0.0      0.0  0.000000
        2021-09-30    0.0        0.0      0.0  0.000000
        2021-10-31    2.0       20.0     10.0  0.200000
        2021-11-30    0.0        0.0      0.0  0.000000
1857    2021-06-01    1.0      100.0      0.0  0.000000

huangapple
  • 本文由 发表于 2023年6月2日 04:42:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76385577.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定