2023年6月2日 04:42:29go评论78阅读模式

英文:

How to replace level 2 in a multiIndex dataframe with values from a regular dataframe?

问题

我正在编写一个用于处理工作中客户数据分析的算法，我正在按用户ID作为我的第一级，然后按月份作为我的第二级来对数据进行分组。

然后，我对数据进行逐个用户的处理，其中包括一个月度时间序列。

我下面的代码完全做到了我需要的，除了最后一步。

如何将原始数据中的数据与输出中的处理后时间序列数据合并？

    tempDF1 = pd.read_csv('data.csv', index_col=[0,1], parse_dates=[1], thousands=',')
    
    tempDF1["Average"] = 0
    tempDF1["Score"] = 0
    
    for id, df in tempDF1.groupby(level=0):
        for date in df.loc[id].index:
            df.loc[(id,date),"Average"] = df.loc[(id,date)].Purchased/df.loc[(id,date)].Count
            df.loc[(id,date),"Score"] = df.loc[(id,date)].Count/10*log(df.loc[(id,date)].Average, 10)
        try:
            list(df.loc[(id,"2021-01-01",),:])
        except:
            df.loc[(id, "2021-01-01",),:] = 0
        try:
            list(df.loc[(id,"2023-06-01",),:])
        except:
            df.loc[(id, "2023-06-01",),:] = 0

        resampleDF = df.loc[id].resample('M', closed="left").mean().fillna(0)
        print(resampleDF)
        tempDF1.loc[id].update(resampleDF)
        print(tempDF1)

以下是上述代码的示例输出。resampleDF 是用于 ID 1785 的，我正在尝试替换整个第二级索引，列中的数据，以及所有其他内容。我需要将 resampleDF 放入 tempDF1 中，替换掉 1785（以及其他的，但应该由上面的循环完成）。

resampleDF

月份	数量	购买	平均值	分数
2021-01-31	0.0	0.0	0.0	0.000000
2021-02-28	0.0	0.0	0.0	0.000000
2021-03-31	0.0	0.0	0.0	0.000000
2021-04-30	0.0	0.0	0.0	0.000000
2021-05-31	0.0	0.0	0.0	0.000000
2021-06-30	1.0	9999.0	9999.0	0.399996
2021-07-31	0.0	0.0	0.0	0.000000
2021-08-31	0.0	0.0	0.0	0.000000
2021-09-30	0.0	0.0	0.0	0.000000
2021-10-31	2.0	20.0	10.0	0.200000
2021-11-30	0.0	0.0	0.0	0.000000

tempDF1

用户ID	月份	数量	购买	平均值	分数
1785	2021-06-01	1	9999	9999.0	0.399996
	2021-10-01	2	20	10.0	0.200000
1857	2021-06-01	1	100	100.0	0.333333
1953	2021-03-01	5	250	50.0	0.398639
2023	2021-09-01	5	1100	220.0	0.599340
...		...	...	...	...
18149	2023-03-01	1	300	300.0	0.477121
18169	2023-03-01	2	40	20.0	0.301030
18177	2023-03-01	2	1100	550.0	0.740363
18210	2023-03-01	1	10	10.0	0.301030
18215	2023-03-01	2	110	55.0	0.397940

英文:

I am writing an algorithm for processing customers at work for some data analysis, and I am grouping the data by user ID as my level 1, then by month for my level 2.

I am then doing some processing work on the data user by user, which involves a monthly time series.

The code I have below does exactly what I need it to do, save for the final step.

How can I overload the data in my original with the processed time series data in the output?

    tempDF1 = pd.read_csv(&#39;data.csv&#39;, index_col=[0,1], parse_dates=[1], thousands=&#39;,&#39;)
    
    tempDF1[&quot;Average&quot;] = 0
    tempDF1[&quot;Score&quot;] = 0
    
    for id, df in tempDF1.groupby(level=0):
        for date in df.loc[id].index:
            df.loc[(id,date),&quot;Average&quot;] = df.loc[(id,date)].Purchased/df.loc[(id,date)].Count
            df.loc[(id,date),&quot;Score&quot;] = df.loc[(id,date)].Count/10*log(df.loc[(id,date)].Average, 10)
        try:
            list(df.loc[(id,&quot;2021-01-01&quot;,),:])
        except:
            df.loc[(id, &quot;2021-01-01&quot;,),:] = 0
        try:
            list(df.loc[(id,&quot;2023-06-01&quot;,),:])
        except:
            df.loc[(id, &quot;2023-06-01&quot;,),:] = 0

        resampleDF = df.loc[id].resample(&#39;M&#39;, closed=&quot;left&quot;).mean().fillna(0)
        print(resampleDF)
        tempDF1.loc[id].replace(resampleDF, inplace=True)
        print(tempDF1)

Below is the sample output from the code above. The resampleDF is for ID 1785 and I am trying to replace the whole level; level 2 indexing, the data in the columns, everything wholesale. I need to put resampleDF into tempDF1 in place of what is there for 1785 (and the rest, but that should be accomplished by the above loop.

resampleDF

Month	Count	Purchased	Average	Score
2021-01-31	0.0	0.0	0.0	0.000000
2021-02-28	0.0	0.0	0.0	0.000000
2021-03-31	0.0	0.0	0.0	0.000000
2021-04-30	0.0	0.0	0.0	0.000000
2021-05-31	0.0	0.0	0.0	0.000000
2021-06-30	1.0	9999.0	9999.0	0.399996
2021-07-31	0.0	0.0	0.0	0.000000
2021-08-31	0.0	0.0	0.0	0.000000
2021-09-30	0.0	0.0	0.0	0.000000
2021-10-31	2.0	20.0	10.0	0.200000
2021-11-30	0.0	0.0	0.0	0.000000

tempDF1

user_id	Month	Count	Purchased	Average	Score
1785	2021-06-01	1	9999	0	0
	2021-10-01	2	20	0	0
1857	2021-06-01	1	100	0	0
1953	2021-03-01	5	250	0	0
2023	2021-09-01	5	1100	0	0
...		...	...	...	...
18149	2023-03-01	1	300	0	0
18169	2023-03-01	2	40	0	0
18177	2023-03-01	2	1100	0	0
18210	2023-03-01	1	10	0	0
18215	2023-03-01	2	110	0	0

答案1

得分: 1

以下是翻译好的代码部分：

import pandas as pd

tempDF1 = pd.DataFrame(
    {
        "user_id": [1785, 1785, 1857],
        "Month": ["2021-06-01", "2021-10-01", "2021-06-01"],
        "Count": [1, 2, 1],
        "Purchased": [9999, 20, 100],
        "Average": [0, 0, 0],
        "Score": [0, 0, 0],
    },
).set_index(["user_id", "Month"])

print(tempDF1)
# 输出

                    Count  Purchased  Average  Score
user_id Month
1785    2021-06-01      1       9999        0      0
        2021-10-01      2         20        0      0
1857    2021-06-01      1        100        0      0

resampleDF = pd.DataFrame(
    {
        "Month": [
            "2021-01-31",
            "2021-02-28",
            "2021-03-31",
            "2021-04-30",
            "2021-05-31",
            "2021-06-30",
            "2021-07-31",
            "2021-08-31",
            "2021-09-30",
            "2021-10-31",
            "2021-11-30",
        ],
        "Count": [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0],
        "Purchased": [0.0, 0.0, 0.0, 0.0, 0.0, 9999.0, 0.0, 0.0, 0.0, 20.0, 0.0],
        "Average": [0.0, 0.0, 0.0, 0.0, 0.0, 9999.0, 0.0, 0.0, 0.0, 10.0, 0.0],
        "Score": [0.0, 0.0, 0.0, 0.0, 0.0, 0.399996, 0.0, 0.0, 0.0, 0.2, 0.0],
    }
)

以下是另一种使用 Pandas 的 drop、assign 和 concat 方法的代码：

tempDF1 = pd.concat(
    [
        resampleDF.assign(user_id=1785).set_index(["user_id", "Month"]),
        tempDF1.drop(index=1785),
    ]
)

print(tempDF1)
# 输出

                    Count  Purchased  Average     Score
user_id Month
1785    2021-01-31    0.0        0.0      0.0  0.000000
        2021-02-28    0.0        0.0      0.0  0.000000
        2021-03-31    0.0        0.0      0.0  0.000000
        2021-04-30    0.0        0.0      0.0  0.000000
        2021-05-31    0.0        0.0      0.0  0.000000
        2021-06-30    1.0     9999.0   9999.0  0.399996
        2021-07-31    0.0        0.0      0.0  0.000000
        2021-08-31    0.0        0.0      0.0  0.000000
        2021-09-30    0.0        0.0      0.0  0.000000
        2021-10-31    2.0       20.0     10.0  0.200000
        2021-11-30    0.0        0.0      0.0  0.000000
1857    2021-06-01    1.0      100.0      0.0  0.000000

英文:

With the dataframes you provided:

import pandas as pd

tempDF1 = pd.DataFrame(
    {
        &quot;user_id&quot;: [1785, 1785, 1857],
        &quot;Month&quot;: [&quot;2021-06-01&quot;, &quot;2021-10-01&quot;, &quot;2021-06-01&quot;],
        &quot;Count&quot;: [1, 2, 1],
        &quot;Purchased&quot;: [9999, 20, 100],
        &quot;Average&quot;: [0, 0, 0],
        &quot;Score&quot;: [0, 0, 0],
    },
).set_index([&quot;user_id&quot;, &quot;Month&quot;])

print(tempDF1)
# Output

                    Count  Purchased  Average  Score
user_id Month
1785    2021-06-01      1       9999        0      0
        2021-10-01      2         20        0      0
1857    2021-06-01      1        100        0      0

resampleDF = pd.DataFrame(
    {
        &quot;Month&quot;: [
            &quot;2021-01-31&quot;,
            &quot;2021-02-28&quot;,
            &quot;2021-03-31&quot;,
            &quot;2021-04-30&quot;,
            &quot;2021-05-31&quot;,
            &quot;2021-06-30&quot;,
            &quot;2021-07-31&quot;,
            &quot;2021-08-31&quot;,
            &quot;2021-09-30&quot;,
            &quot;2021-10-31&quot;,
            &quot;2021-11-30&quot;,
        ],
        &quot;Count&quot;: [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 2.0, 0.0],
        &quot;Purchased&quot;: [0.0, 0.0, 0.0, 0.0, 0.0, 9999.0, 0.0, 0.0, 0.0, 20.0, 0.0],
        &quot;Average&quot;: [0.0, 0.0, 0.0, 0.0, 0.0, 9999.0, 0.0, 0.0, 0.0, 10.0, 0.0],
        &quot;Score&quot;: [0.0, 0.0, 0.0, 0.0, 0.0, 0.399996, 0.0, 0.0, 0.0, 0.2, 0.0],
    }
)

Here is one way to do it with Pandas drop, assign, and concat:

tempDF1 = pd.concat(
    [
        resampleDF.assign(user_id=1785).set_index([&quot;user_id&quot;, &quot;Month&quot;]),
        tempDF1.drop(index=1785),
    ]
)

print(tempDF1)
# Output

                    Count  Purchased  Average     Score
user_id Month
1785    2021-01-31    0.0        0.0      0.0  0.000000
        2021-02-28    0.0        0.0      0.0  0.000000
        2021-03-31    0.0        0.0      0.0  0.000000
        2021-04-30    0.0        0.0      0.0  0.000000
        2021-05-31    0.0        0.0      0.0  0.000000
        2021-06-30    1.0     9999.0   9999.0  0.399996
        2021-07-31    0.0        0.0      0.0  0.000000
        2021-08-31    0.0        0.0      0.0  0.000000
        2021-09-30    0.0        0.0      0.0  0.000000
        2021-10-31    2.0       20.0     10.0  0.200000
        2021-11-30    0.0        0.0      0.0  0.000000
1857    2021-06-01    1.0      100.0      0.0  0.000000

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何用常规数据框中的值替换多级索引数据框中的第二级？

问题

答案1

I2C设置问题

在另一张表格中查找数据，如果找不到则复制到新文件。

从对象列表中选择其中一个数值出现多次的对象。

AWS Airflow使用配置JSON触发任务。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论