基于Python中组内条件创建变量。

huangapple go评论96阅读模式
英文:

Creating a variable based on conditions within a group python

问题

I want to create a variable that shows me whether a customer has reallocated funds after taking out all their money from a position within one of their depots.

I define a reallocation, if customer (main_depot_id) a reduces their position in one fund by 90% (sold 90% == 'Y') and reinvests the same amount +-20% into another fund (different sub_depot_id).

My data looks like this:

  1. df = pd.DataFrame({
  2. 'main_depot_id':[1, 1, 1, 2, 2, 2, 2],
  3. 'sub_depot_id': [1, 1, 2, 1, 1, 2, 2],
  4. 'inflow': [100, 0, 100, 200, 0, 0, 220],
  5. 'outflow': [0, 100, 0, 0, 0, 200, 0],
  6. 'sold 90%': ['N', 'Y', 'N', 'N', 'N', 'Y', 'N']})
  7. # I omitted a date variable above

Now I am not sure how to proceed. My idea was to group by main_depot id and then possibly use a loop that loops through each id. However, I am really at my wits end on how to define my reallocation variable.

英文:

I want to create a variable that shows me whether a customer has reallocated funds after taking out all their money from a position within one of their depots.

I define a reallocation, if customer (main_depot_id) a reduces their position in one fund by 90% (sold 90% == 'Y') and reinvests the same amount +-20% into another fund (different sub_depot_id).

My data looks like this:

  1. df = pd.DataFrame({
  2. 'main_depot_id':[1, 1, 1, 2, 2, 2, 2],
  3. 'sub_depot_id': [1, 1, 2, 1, 1, 2, 2],
  4. 'inflow': [100, 0, 100, 200, 0, 0, 220],
  5. 'outflow': [0, 100, 0, 0, 0, 200, 0],
  6. 'sold 90%': ['N', 'Y', 'N', 'N', 'N', 'Y', 'N']})
  7. # I omitted a date variable above

Now I am not sure how to proceed. My idea was to group by main_depot id and then possible use a loop that loops through each id. However, I am really at my wits end on how to define my reallocation variable.

答案1

得分: 2

根据事件按时间排序的假设,您可以使用 groupby.ffill 方法对 "sub_depot_id" 和前一个 "sold 90%" 等于 "Y" 的 "outflow" 进行分组,然后识别 "sub_depot_id" 已更改且 "inflow" 在前向填充的 "outflow" 的20%内的行:

  1. # 按组填充前一个 Y 的值
  2. tmp = (df[['sub_depot_id', 'outflow']]
  3. .where(df['sold 90%'].eq('Y'))
  4. .groupby(df['main_depot_id']).ffill()
  5. )
  6. # 车站是否已更改?
  7. m1 = df['sub_depot_id'].ne(tmp['sub_depot_id'])
  8. # 金额是否在 outflow 金额的20%内?
  9. m2 = df['inflow'].sub(tmp['outflow']).div(tmp['outflow']).abs().le(0.2)
  10. # 如果两个条件都为真,则分配 "Y",否则分配 "N"
  11. df['reallocated'] = np.where(m1 & m2, 'Y', 'N')

输出结果如下:

  1. main_depot_id sub_depot_id inflow outflow sold 90% reallocated
  2. 0 1 1 100 0 N N
  3. 1 1 1 0 100 Y N
  4. 2 1 2 100 0 N Y
  5. 3 2 1 200 0 N N
  6. 4 2 1 0 0 N N
  7. 5 2 2 0 200 Y N
  8. 6 2 2 220 0 N N

中间结果如下:

  1. main_depot_id sub_depot_id inflow outflow sold 90% reallocated sub_depot_id_ffill outflow_ffill m1 m2
  2. 0 1 1 100 0 N N NaN NaN True False
  3. 1 1 1 0 100 Y N 1.0 100.0 False False
  4. 2 1 2 100 0 N Y 1.0 100.0 True True
  5. 3 2 1 200 0 N N NaN NaN True False
  6. 4 2 1 0 0 N N NaN NaN True False
  7. 5 2 2 0 200 Y N 2.0 200.0 False False
  8. 6 2 2 220 0 N N 2.0 200.0 False True

请注意,这是原始代码和输出的翻译,不包括问题或其他内容。

英文:

Assuming the events are sorted chronologically, you can groupby.ffill the "sub_depot_id" and "outflow" from the previous "sold 90%" equal to Y, then identify the rows for which "sub_depot_id" has changed and "inflow" is within 20% of the forward filled "outflow":

  1. # fill the values from the previous Y per group
  2. tmp = (df[['sub_depot_id', 'outflow']]
  3. .where(df['sold 90%'].eq('Y'))
  4. .groupby(df['main_depot_id']).ffill()
  5. )
  6. # has the depot changed?
  7. m1 = df['sub_depot_id'].ne(tmp['sub_depot_id'])
  8. # is the amount within 20% of the outflow amount?
  9. m2 = df['inflow'].sub(tmp['outflow']).div(tmp['outflow']).abs().le(0.2)
  10. # if both conditions are true, assign "Y", else assign "N"
  11. df['reallocated'] = np.where(m1&m2, 'Y', 'N')

Output:

  1. main_depot_id sub_depot_id inflow outflow sold 90% reallocated
  2. 0 1 1 100 0 N N
  3. 1 1 1 0 100 Y N
  4. 2 1 2 100 0 N Y
  5. 3 2 1 200 0 N N
  6. 4 2 1 0 0 N N
  7. 5 2 2 0 200 Y N
  8. 6 2 2 220 0 N N

Intermediates:

  1. main_depot_id sub_depot_id inflow outflow sold 90% reallocated sub_depot_id_ffill outflow_ffill m1 m2
  2. 0 1 1 100 0 N N NaN NaN True False
  3. 1 1 1 0 100 Y N 1.0 100.0 False False
  4. 2 1 2 100 0 N Y 1.0 100.0 True True
  5. 3 2 1 200 0 N N NaN NaN True False
  6. 4 2 1 0 0 N N NaN NaN True False
  7. 5 2 2 0 200 Y N 2.0 200.0 False False
  8. 6 2 2 220 0 N N 2.0 200.0 False True

huangapple
  • 本文由 发表于 2023年5月17日 22:53:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76273449.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定