英文:
Creating a variable based on conditions within a group python
问题
I want to create a variable that shows me whether a customer has reallocated funds after taking out all their money from a position within one of their depots.
I define a reallocation, if customer (main_depot_id
) a reduces their position in one fund by 90% (sold 90% == 'Y'
) and reinvests the same amount +-20% into another fund (different sub_depot_id
).
My data looks like this:
df = pd.DataFrame({
'main_depot_id':[1, 1, 1, 2, 2, 2, 2],
'sub_depot_id': [1, 1, 2, 1, 1, 2, 2],
'inflow': [100, 0, 100, 200, 0, 0, 220],
'outflow': [0, 100, 0, 0, 0, 200, 0],
'sold 90%': ['N', 'Y', 'N', 'N', 'N', 'Y', 'N']})
# I omitted a date variable above
Now I am not sure how to proceed. My idea was to group by main_depot id and then possibly use a loop that loops through each id. However, I am really at my wits end on how to define my reallocation variable.
英文:
I want to create a variable that shows me whether a customer has reallocated funds after taking out all their money from a position within one of their depots.
I define a reallocation, if customer (main_depot_id
) a reduces their position in one fund by 90% (sold 90% == 'Y'
) and reinvests the same amount +-20% into another fund (different sub_depot_id
).
My data looks like this:
df = pd.DataFrame({
'main_depot_id':[1, 1, 1, 2, 2, 2, 2],
'sub_depot_id': [1, 1, 2, 1, 1, 2, 2],
'inflow': [100, 0, 100, 200, 0, 0, 220],
'outflow': [0, 100, 0, 0, 0, 200, 0],
'sold 90%': ['N', 'Y', 'N', 'N', 'N', 'Y', 'N']})
# I omitted a date variable above
Now I am not sure how to proceed. My idea was to group by main_depot id and then possible use a loop that loops through each id. However, I am really at my wits end on how to define my reallocation variable.
答案1
得分: 2
根据事件按时间排序的假设,您可以使用 groupby.ffill
方法对 "sub_depot_id" 和前一个 "sold 90%" 等于 "Y" 的 "outflow" 进行分组,然后识别 "sub_depot_id" 已更改且 "inflow" 在前向填充的 "outflow" 的20%内的行:
# 按组填充前一个 Y 的值
tmp = (df[['sub_depot_id', 'outflow']]
.where(df['sold 90%'].eq('Y'))
.groupby(df['main_depot_id']).ffill()
)
# 车站是否已更改?
m1 = df['sub_depot_id'].ne(tmp['sub_depot_id'])
# 金额是否在 outflow 金额的20%内?
m2 = df['inflow'].sub(tmp['outflow']).div(tmp['outflow']).abs().le(0.2)
# 如果两个条件都为真,则分配 "Y",否则分配 "N"
df['reallocated'] = np.where(m1 & m2, 'Y', 'N')
输出结果如下:
main_depot_id sub_depot_id inflow outflow sold 90% reallocated
0 1 1 100 0 N N
1 1 1 0 100 Y N
2 1 2 100 0 N Y
3 2 1 200 0 N N
4 2 1 0 0 N N
5 2 2 0 200 Y N
6 2 2 220 0 N N
中间结果如下:
main_depot_id sub_depot_id inflow outflow sold 90% reallocated sub_depot_id_ffill outflow_ffill m1 m2
0 1 1 100 0 N N NaN NaN True False
1 1 1 0 100 Y N 1.0 100.0 False False
2 1 2 100 0 N Y 1.0 100.0 True True
3 2 1 200 0 N N NaN NaN True False
4 2 1 0 0 N N NaN NaN True False
5 2 2 0 200 Y N 2.0 200.0 False False
6 2 2 220 0 N N 2.0 200.0 False True
请注意,这是原始代码和输出的翻译,不包括问题或其他内容。
英文:
Assuming the events are sorted chronologically, you can groupby.ffill
the "sub_depot_id" and "outflow" from the previous "sold 90%" equal to Y, then identify the rows for which "sub_depot_id" has changed and "inflow" is within 20% of the forward filled "outflow":
# fill the values from the previous Y per group
tmp = (df[['sub_depot_id', 'outflow']]
.where(df['sold 90%'].eq('Y'))
.groupby(df['main_depot_id']).ffill()
)
# has the depot changed?
m1 = df['sub_depot_id'].ne(tmp['sub_depot_id'])
# is the amount within 20% of the outflow amount?
m2 = df['inflow'].sub(tmp['outflow']).div(tmp['outflow']).abs().le(0.2)
# if both conditions are true, assign "Y", else assign "N"
df['reallocated'] = np.where(m1&m2, 'Y', 'N')
Output:
main_depot_id sub_depot_id inflow outflow sold 90% reallocated
0 1 1 100 0 N N
1 1 1 0 100 Y N
2 1 2 100 0 N Y
3 2 1 200 0 N N
4 2 1 0 0 N N
5 2 2 0 200 Y N
6 2 2 220 0 N N
Intermediates:
main_depot_id sub_depot_id inflow outflow sold 90% reallocated sub_depot_id_ffill outflow_ffill m1 m2
0 1 1 100 0 N N NaN NaN True False
1 1 1 0 100 Y N 1.0 100.0 False False
2 1 2 100 0 N Y 1.0 100.0 True True
3 2 1 200 0 N N NaN NaN True False
4 2 1 0 0 N N NaN NaN True False
5 2 2 0 200 Y N 2.0 200.0 False False
6 2 2 220 0 N N 2.0 200.0 False True
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论