英文:
search for start date and ID in one dataframe and zero out data previous to the start date in another dataframe
问题
我想使用查找表来查找开始日期,然后将查找表中开始日期之前的df数据归零。
查找开始日期与唯一ID
MNI_ID 开始日期
0 MNWIS-13 10/1/1967
1 MNWIS-12 12/1/1967
2 MDWR-1 1/1/1968
3 MW-15 2/1/1968
df
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 12.30 22.00 1.00 17.00
11/1/1967 11.00 33.00 23.00 80.00
12/1/1967 45.00 555.00 43.00 45.00
1/1/1968 56.60 405.00 69.00 67.00
2/1/1968 45.00 33.00 20.00 29.00
希望数据框看起来像这样:
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 12.30 0 0 0
11/1/1967 11.00 0 0 0
12/1/1967 45.00 555.00 0 0
1/1/1968 56.60 405.00 69.00 0
2/1/1968 45.00 33.00 20.00 29.00
英文:
I'd like to use a lookup table to search for a start date, then zero our the data from df prior to the start date in the lookup table
lookup start date with unique ID
MNI_ID StartDate
0 MNWIS-13 10/1/1967
1 MNWIS-12 12/1/1967
2 MDWR-1 1/1/1968
3 MW-15 2/1/1968
df
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 12.30 22.00 1.00 17.00
11/1/1967 11.00 33.00 23.00 80.00
12/1/1967 45.00 555.00 43.00 45.00
1/1/1968 56.60 405.00 69.00 67.00
2/1/1968 45.00 33.00 20.00 29.00
Want dataframe to look like this:
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 12.30 0 0 0
11/1/1967 11.00 0 0 0
12/1/1967 45.00 555.00 0 0
1/1/1968 56.60 405.00 69.00 0
2/1/1968 45.00 33.00 20.00 29.00
答案1
得分: 1
解决方案
mask = pd.crosstab(lookup['StartDate'], lookup['MNI_ID'])
mask = mask.reindex_like(df).fillna(0)
mask = mask.cummax()
result = df * mask
解释
使用交叉表(crosstab)将lookup
数据框重塑成宽格式。
pd.crosstab(lookup['StartDate'], lookup['MNI_ID'])
MNI_ID MDWR-1 MNWIS-12 MNWIS-13 MW-15
StartDate
1/1/1968 1 0 0 0
10/1/1967 0 0 1 0
12/1/1967 0 1 0 0
2/1/1968 0 0 0 1
确保重塑后的lookup
数据框的索引与给定的df
匹配。
mask.reindex_like(df).fillna(0)
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 1.0 0.0 0.0 0.0
11/1/1967 0.0 0.0 0.0 0.0
12/1/1967 0.0 1.0 0.0 0.0
1/1/1968 0.0 0.0 1.0 0.0
2/1/1968 0.0 0.0 0.0 1.0
沿着索引轴计算累积最大值,创建一个布尔掩码,该掩码可用于将匹配的StartDate
之前的值置零。
mask.cummax()
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 1.0 0.0 0.0 0.0
11/1/1967 1.0 0.0 0.0 0.0
12/1/1967 1.0 1.0 0.0 0.0
1/1/1968 1.0 1.0 1.0 0.0
2/1/1968 1.0 1.0 1.0 1.0
将布尔mask
与给定的数据框相乘,以获得所需的结果。
df * mask
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 12.3 0.0 0.0 0.0
11/1/1967 11.0 0.0 0.0 0.0
12/1/1967 45.0 555.0 0.0 0.0
1/1/1968 56.6 405.0 69.0 0.0
2/1/1968 45.0 33.0 20.0 29.0
<details>
<summary>英文:</summary>
### Solution
mask = pd.crosstab(lookup['StartDate'], lookup['MNI_ID'])
mask = mask.reindex_like(df).fillna(0)
mask = mask.cummax()
result = df * mask
#### Explained
Crosstab to reshape the `lookup` dataframe into wide format.
pd.crosstab(lookup['StartDate'], lookup['MNI_ID'])
MNI_ID MDWR-1 MNWIS-12 MNWIS-13 MW-15
StartDate
1/1/1968 1 0 0 0
10/1/1967 0 0 1 0
12/1/1967 0 1 0 0
2/1/1968 0 0 0 1
Ensure the index of reshaped `lookup` dataframe matches the given `df`
mask.reindex_like(df).fillna(0)
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 1.0 0.0 0.0 0.0
11/1/1967 0.0 0.0 0.0 0.0
12/1/1967 0.0 1.0 0.0 0.0
1/1/1968 0.0 0.0 1.0 0.0
2/1/1968 0.0 0.0 0.0 1.0
Calculate the cumulative maximum along index axis to create a boolean mask which can be used to zero the previous values before the matching `StartDate`
mask.cummax()
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 1.0 0.0 0.0 0.0
11/1/1967 1.0 0.0 0.0 0.0
12/1/1967 1.0 1.0 0.0 0.0
1/1/1968 1.0 1.0 1.0 0.0
2/1/1968 1.0 1.0 1.0 1.0
Multiply the boolean `mask` with the given dataframe to get the desired result
df * mask
MNWIS-13 MNWIS-12 MDWR-1 MW-15
10/1/1967 12.3 0.0 0.0 0.0
11/1/1967 11.0 0.0 0.0 0.0
12/1/1967 45.0 555.0 0.0 0.0
1/1/1968 56.6 405.0 69.0 0.0
2/1/1968 45.0 33.0 20.0 29.0
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论