循环遍历数据子集的组合以进行处理。

huangapple go评论49阅读模式
英文:

Looping through combinations of subsets of data for processing

问题

我正在处理销售数据,对两个不同维度的组合进行子集处理。

第一个维度是由这三个指示器中的每一个指示 ['RA','DS','TP'] 表示的类别。数据中还有更多的指示器;然而,只有这些是感兴趣的,其他没有提到的指示器可以忽略。

除了这些指示器,我想要跨不同的时间间隔进行子集处理 7 天前,30 天前,60 天前,90 天前,120 天前,以及没有时间限制

如果不使用循环,将会为这些维度的组合创建 18 个不同的函数,即 3 个类别 x 6 个时间间隔,这是我最初开始做的事情。

例如,一个子集处理 DS 和 7 天前的函数:

def seven_days_ds(df):

    subset = df[df['Status Date'] > (datetime.now() - pd.to_timedelta("7day"))]
    subset = subset[subset['Association Label']=="DS"]
    
    grouped_subset = subset.groupby(['Status Labelled'])
    status_counts_seven_ds = (pd.DataFrame(grouped_subset['Status Labelled'].count()))
    status_counts_seven_ds.columns = ['Counts']
    status_counts_seven_ds = status_counts_seven_ds.reset_index()
    
    return status_counts_seven_ds #(实际函数比这个更复杂)。

然后重复这个过程,但更改每个类别和时间间隔的子集条件,共有 18 种感兴趣的变量组合。显然,这不是理想的方法。

是否有一种方法可以创建这 18 个对象的单一函数,或者(理想情况下)一个单一对象,其列表示正在进行子集处理的维度?例如 counts_ds_7 等。

还是说这不可能,我只能按照冗长的方式分开做?

英文:

I am processing sales data, sub-setting across a combination of two distinct dimensions.

The first is a category as indicated by each of these three indicators ['RA','DS','TP']. There are more indicators in the data; however, those are the only ones of interest, and the others not mentioned but in the data can be ignored.

In combination with those indicators, I want to subset across varying time intervals 7 days back, 30 days back, 60 days back, 90 days back, 120 days back, and no time constraint

Without looping through this would create 18 distinct functions for those combinations of dimensions 3 categories x 6 time which was what I first started to do

for example a function that subsets on DS and 7 days back:

def seven_days_ds(df):

    subset = df[df['Status Date'] > (datetime.now() - pd.to_timedelta("7day"))]
    subset = subset[subset['Association Label']=="DS"]
    
    grouped_subset = subset.groupby(['Status Labelled'])
    status_counts_seven_ds = (pd.DataFrame(grouped_subset['Status Labelled'].count()))
    status_counts_seven_ds.columns = ['Counts']
    status_counts_seven_ds = status_counts_seven_ds.reset_index()
    
    return status_counts_seven_ds #(the actual function is more complicated than this).

And then repeat this, but changing the subset criteria for each combination of category and time-delta for 18 combinations of the variables of interest. Obviously, this is not ideal.

Is there a way to have a single function that creates those 18 objects, or (ideally), a single object whose columns indicate the dimensions being subset on? ie counts_ds_7 etc.

Or is this not possible, and I'm stuck doing it the long way doing them all separately?

答案1

得分: 1

我理解你可以使用:

    def crossubsets(df):
        labels = ["RA", "DS", "TP"]
        time_intervals = [7, 30, 60, 90, 120, None]
        group_dfs = df.loc[
            df["Association Label"].isin(labels)
        ].groupby("Association Label")

        data = []
        for l, g in group_dfs:
            for ti in time_intervals:
                s = (
                    g[g["Status Date"] > (pd.Timestamp.now() - pd.Timedelta(ti, "d"))]
                    if ti is not None else g
                )
                data.append(s["Status Labelled"].value_counts().rename(f"counts_{l}_{ti}"))

        return pd.concat(data, axis=1)  #with optional .T to have 18 rows instead of cols
英文:

IIUC, you can use :

def crossubsets(df):
    labels = ["RA", "DS", "TP"]
    time_intervals = [7, 30, 60, 90, 120, None]
    group_dfs = df.loc[
        df["Association Label"].isin(labels)
    ].groupby("Association Label")

    data = []
    for l, g in group_dfs:
        for ti in time_intervals:
            s = (
                g[g["Status Date"] > (pd.Timestamp.now() - pd.Timedelta(ti, "d"))]
                if ti is not None else g
            )
            data.append(s["Status Labelled"].value_counts().rename(f"counts_{l}_{ti}"))

    return pd.concat(data, axis=1) #with optional .T to have 18 rows instead of cols

huangapple
  • 本文由 发表于 2023年6月2日 01:28:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76384338.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定