基于Python中组内条件创建变量。

huangapple go评论71阅读模式
英文:

Creating a variable based on conditions within a group python

问题

I want to create a variable that shows me whether a customer has reallocated funds after taking out all their money from a position within one of their depots.

I define a reallocation, if customer (main_depot_id) a reduces their position in one fund by 90% (sold 90% == 'Y') and reinvests the same amount +-20% into another fund (different sub_depot_id).

My data looks like this:

df = pd.DataFrame({
'main_depot_id':[1, 1, 1, 2, 2, 2, 2], 
'sub_depot_id': [1, 1, 2, 1, 1, 2, 2],
'inflow': [100, 0, 100, 200, 0, 0, 220], 
'outflow': [0, 100, 0, 0, 0, 200, 0], 
'sold 90%': ['N', 'Y', 'N', 'N', 'N', 'Y', 'N']}) 
# I omitted a date variable above

Now I am not sure how to proceed. My idea was to group by main_depot id and then possibly use a loop that loops through each id. However, I am really at my wits end on how to define my reallocation variable.

英文:

I want to create a variable that shows me whether a customer has reallocated funds after taking out all their money from a position within one of their depots.

I define a reallocation, if customer (main_depot_id) a reduces their position in one fund by 90% (sold 90% == 'Y') and reinvests the same amount +-20% into another fund (different sub_depot_id).

My data looks like this:

df = pd.DataFrame({
'main_depot_id':[1, 1, 1, 2, 2, 2, 2], 
'sub_depot_id': [1, 1, 2, 1, 1, 2, 2],
'inflow': [100, 0, 100, 200, 0, 0, 220], 
'outflow': [0, 100, 0, 0, 0, 200, 0], 
'sold 90%': ['N', 'Y', 'N', 'N', 'N', 'Y', 'N']}) 
# I omitted a date variable above

Now I am not sure how to proceed. My idea was to group by main_depot id and then possible use a loop that loops through each id. However, I am really at my wits end on how to define my reallocation variable.

答案1

得分: 2

根据事件按时间排序的假设,您可以使用 groupby.ffill 方法对 "sub_depot_id" 和前一个 "sold 90%" 等于 "Y" 的 "outflow" 进行分组,然后识别 "sub_depot_id" 已更改且 "inflow" 在前向填充的 "outflow" 的20%内的行:

# 按组填充前一个 Y 的值
tmp = (df[['sub_depot_id', 'outflow']]
       .where(df['sold 90%'].eq('Y'))
       .groupby(df['main_depot_id']).ffill()
      )

# 车站是否已更改?
m1 = df['sub_depot_id'].ne(tmp['sub_depot_id'])
# 金额是否在 outflow 金额的20%内?
m2 = df['inflow'].sub(tmp['outflow']).div(tmp['outflow']).abs().le(0.2)

# 如果两个条件都为真,则分配 "Y",否则分配 "N"
df['reallocated'] = np.where(m1 & m2, 'Y', 'N')

输出结果如下:

   main_depot_id  sub_depot_id  inflow  outflow sold 90% reallocated
0              1             1     100        0        N           N
1              1             1       0      100        Y           N
2              1             2     100        0        N           Y
3              2             1     200        0        N           N
4              2             1       0        0        N           N
5              2             2       0      200        Y           N
6              2             2     220        0        N           N

中间结果如下:

   main_depot_id  sub_depot_id  inflow  outflow sold 90% reallocated  sub_depot_id_ffill  outflow_ffill     m1     m2
0              1             1     100        0        N           N                 NaN            NaN   True  False
1              1             1       0      100        Y           N                 1.0          100.0  False  False
2              1             2     100        0        N           Y                 1.0          100.0   True   True
3              2             1     200        0        N           N                 NaN            NaN   True  False
4              2             1       0        0        N           N                 NaN            NaN   True  False
5              2             2       0      200        Y           N                 2.0          200.0  False  False
6              2             2     220        0        N           N                 2.0          200.0  False   True

请注意,这是原始代码和输出的翻译,不包括问题或其他内容。

英文:

Assuming the events are sorted chronologically, you can groupby.ffill the "sub_depot_id" and "outflow" from the previous "sold 90%" equal to Y, then identify the rows for which "sub_depot_id" has changed and "inflow" is within 20% of the forward filled "outflow":

# fill the values from the previous Y per group
tmp = (df[['sub_depot_id', 'outflow']]
       .where(df['sold 90%'].eq('Y'))
       .groupby(df['main_depot_id']).ffill()
      )

# has the depot changed?
m1 = df['sub_depot_id'].ne(tmp['sub_depot_id'])
# is the amount within 20% of the outflow amount?
m2 = df['inflow'].sub(tmp['outflow']).div(tmp['outflow']).abs().le(0.2)

# if both conditions are true, assign "Y", else assign "N"
df['reallocated'] = np.where(m1&m2, 'Y', 'N')

Output:

   main_depot_id  sub_depot_id  inflow  outflow sold 90% reallocated
0              1             1     100        0        N           N
1              1             1       0      100        Y           N
2              1             2     100        0        N           Y
3              2             1     200        0        N           N
4              2             1       0        0        N           N
5              2             2       0      200        Y           N
6              2             2     220        0        N           N

Intermediates:

   main_depot_id  sub_depot_id  inflow  outflow sold 90% reallocated  sub_depot_id_ffill  outflow_ffill     m1     m2
0              1             1     100        0        N           N                 NaN            NaN   True  False
1              1             1       0      100        Y           N                 1.0          100.0  False  False
2              1             2     100        0        N           Y                 1.0          100.0   True   True
3              2             1     200        0        N           N                 NaN            NaN   True  False
4              2             1       0        0        N           N                 NaN            NaN   True  False
5              2             2       0      200        Y           N                 2.0          200.0  False  False
6              2             2     220        0        N           N                 2.0          200.0  False   True

huangapple
  • 本文由 发表于 2023年5月17日 22:53:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76273449.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定