search for start date and ID in one dataframe and zero out data previous to the start date in another dataframe

huangapple go评论70阅读模式
英文:

search for start date and ID in one dataframe and zero out data previous to the start date in another dataframe

问题

我想使用查找表来查找开始日期,然后将查找表中开始日期之前的df数据归零。

查找开始日期与唯一ID

MNI_ID 开始日期
0 MNWIS-13 10/1/1967
1 MNWIS-12 12/1/1967
2 MDWR-1 1/1/1968
3 MW-15 2/1/1968

df

MNWIS-13	MNWIS-12	MDWR-1	MW-15

10/1/1967 12.30 22.00 1.00 17.00
11/1/1967 11.00 33.00 23.00 80.00
12/1/1967 45.00 555.00 43.00 45.00
1/1/1968 56.60 405.00 69.00 67.00
2/1/1968 45.00 33.00 20.00 29.00

希望数据框看起来像这样:

MNWIS-13	MNWIS-12	MDWR-1	MW-15

10/1/1967 12.30 0 0 0
11/1/1967 11.00 0 0 0
12/1/1967 45.00 555.00 0 0
1/1/1968 56.60 405.00 69.00 0
2/1/1968 45.00 33.00 20.00 29.00

英文:

I'd like to use a lookup table to search for a start date, then zero our the data from df prior to the start date in the lookup table

lookup start date with unique ID

   MNI_ID	StartDate	
 0 MNWIS-13	10/1/1967	
 1 MNWIS-12	12/1/1967	
 2 MDWR-1	1/1/1968	
 3 MW-15	2/1/1968	

df

	MNWIS-13	MNWIS-12	MDWR-1	MW-15
10/1/1967	12.30	22.00	1.00	17.00
11/1/1967	11.00	33.00	23.00	80.00
12/1/1967	45.00	555.00	43.00	45.00
1/1/1968	56.60	405.00	69.00	67.00
2/1/1968	45.00	33.00	20.00	29.00

Want dataframe to look like this:

	    MNWIS-13	MNWIS-12	MDWR-1	MW-15
10/1/1967	12.30	0	         0	     0
11/1/1967	11.00	0	         0	     0
12/1/1967	45.00	555.00	     0    	 0
1/1/1968	56.60	405.00	     69.00	 0
2/1/1968	45.00	33.00	     20.00	 29.00

答案1

得分: 1

解决方案

mask = pd.crosstab(lookup['StartDate'], lookup['MNI_ID'])
mask = mask.reindex_like(df).fillna(0)
mask = mask.cummax()

result = df * mask

解释

使用交叉表(crosstab)将lookup数据框重塑成宽格式。

pd.crosstab(lookup['StartDate'], lookup['MNI_ID'])
MNI_ID     MDWR-1  MNWIS-12  MNWIS-13  MW-15
StartDate                                   
1/1/1968        1         0         0      0
10/1/1967       0         0         1      0
12/1/1967       0         1         0      0
2/1/1968        0         0         0      1

确保重塑后的lookup数据框的索引与给定的df匹配。

mask.reindex_like(df).fillna(0)
           MNWIS-13  MNWIS-12  MDWR-1  MW-15
10/1/1967       1.0       0.0     0.0    0.0
11/1/1967       0.0       0.0     0.0    0.0
12/1/1967       0.0       1.0     0.0    0.0
1/1/1968        0.0       0.0     1.0    0.0
2/1/1968        0.0       0.0     0.0    1.0

沿着索引轴计算累积最大值,创建一个布尔掩码,该掩码可用于将匹配的StartDate之前的值置零。

mask.cummax()
           MNWIS-13  MNWIS-12  MDWR-1  MW-15
10/1/1967       1.0       0.0     0.0    0.0
11/1/1967       1.0       0.0     0.0    0.0
12/1/1967       1.0       1.0     0.0    0.0
1/1/1968        1.0       1.0     1.0    0.0
2/1/1968        1.0       1.0     1.0    1.0

将布尔mask与给定的数据框相乘,以获得所需的结果。

df * mask
           MNWIS-13  MNWIS-12  MDWR-1  MW-15
10/1/1967      12.3       0.0     0.0    0.0
11/1/1967      11.0       0.0     0.0    0.0
12/1/1967      45.0     555.0     0.0    0.0
1/1/1968       56.6     405.0    69.0    0.0
2/1/1968       45.0      33.0    20.0   29.0

<details>
<summary>英文:</summary>

### Solution

    mask = pd.crosstab(lookup[&#39;StartDate&#39;], lookup[&#39;MNI_ID&#39;])
    mask = mask.reindex_like(df).fillna(0)
    mask = mask.cummax()
    
    result = df * mask


#### Explained

Crosstab to reshape the `lookup` dataframe into wide format.

    pd.crosstab(lookup[&#39;StartDate&#39;], lookup[&#39;MNI_ID&#39;])
    
    MNI_ID     MDWR-1  MNWIS-12  MNWIS-13  MW-15
    StartDate                                   
    1/1/1968        1         0         0      0
    10/1/1967       0         0         1      0
    12/1/1967       0         1         0      0
    2/1/1968        0         0         0      1

Ensure the index of reshaped `lookup` dataframe matches the given `df`

    mask.reindex_like(df).fillna(0)

               MNWIS-13  MNWIS-12  MDWR-1  MW-15
    10/1/1967       1.0       0.0     0.0    0.0
    11/1/1967       0.0       0.0     0.0    0.0
    12/1/1967       0.0       1.0     0.0    0.0
    1/1/1968        0.0       0.0     1.0    0.0
    2/1/1968        0.0       0.0     0.0    1.0

Calculate the cumulative maximum along index axis to create a boolean mask which can be used to zero the previous values before the matching `StartDate`

    mask.cummax()

               MNWIS-13  MNWIS-12  MDWR-1  MW-15
    10/1/1967       1.0       0.0     0.0    0.0
    11/1/1967       1.0       0.0     0.0    0.0
    12/1/1967       1.0       1.0     0.0    0.0
    1/1/1968        1.0       1.0     1.0    0.0
    2/1/1968        1.0       1.0     1.0    1.0

Multiply the boolean `mask` with the given dataframe to get the desired result

    df * mask

               MNWIS-13  MNWIS-12  MDWR-1  MW-15
    10/1/1967      12.3       0.0     0.0    0.0
    11/1/1967      11.0       0.0     0.0    0.0
    12/1/1967      45.0     555.0     0.0    0.0
    1/1/1968       56.6     405.0    69.0    0.0
    2/1/1968       45.0      33.0    20.0   29.0



</details>



huangapple
  • 本文由 发表于 2023年2月18日 07:59:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75490261.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定