Pandas itertuples – 根据事件在矩阵中填充数值

huangapple go评论114阅读模式
英文:

Pandas itertuples - fill in a matrix with a value based on an event

问题

import pandas as pd

id = [1,1,1,1,1,2,2,2,2,3,3,3,3,4,5,6,7,7,7,8,8,8,9,9,9,10,10,10]
fact = ["IC", "AC","IC","AC","IC","AC", "CC", "CD","IC","CC", "CD","IC","AC", "CD","IC","AC", "CC", "CD","IC","AC", "CC", "CD","IC","AC","IC","IC","AC","IC"]
stamp = ['1979-02-22','1973-11-06','1986-03-12','1986-01-24', '2012-05-22', '2009-01-18', '1992-01-14', '1985-06-05','2001-07-05','2008-11-19','2000-10-13','2002-04-18','1987-08-17','1977-04-09','1984-03-22','1994-08-08','2005-07-09','1982-05-03','2016-01-30','2019-03-10','1981-03-23','1979-07-21','2023-01-14','2018-06-23','1995-08-27','2020-11-08','2014-02-17','1977-09-08']
s = {"ID": id, "fact": fact, "stamp": stamp}
data = pd.DataFrame(data = s)
data.sort_values(by = "stamp", inplace= True)

facts = data.fact.unique()
structure = {'ID': [], 'stamp':[], 'fact': [], 'AC':[], 'CD':[], 'IC':[], 'CC':[]}

for row in data.itertuples():
    structure["ID"].append(getattr(row, 'ID'))
    structure["stamp"].append(getattr(row, 'stamp'))
    structure["fact"].append(getattr(row, 'fact'))
    for fact in facts:
        if getattr(row, 'fact') == fact:
            structure[fact].append(getattr(row, 'stamp'))
        else:
            structure[fact].append('na')
英文:

I am trying to create a matrix in which I fill in the date of the first occurence of an event per row after the specified stamp date in the said row.

Sample dataframe:

id =  [1,1,1,1,1,2,2,2,2,3,3,3,3,4,5,6,7,7,7,8,8,8,9,9,9,10,10,10]
fact = ["IC", "AC","IC","AC","IC","AC", "CC", "CD","IC","CC", "CD","IC","AC", "CD","IC","AC", "CC", "CD","IC","AC", "CC", "CD","IC","AC","IC","IC","AC","IC"]
stamp = ['1979-02-22','1973-11-06','1986-03-12','1986-01-24', '2012-05-22', '2009-01-18', '1992-01-14', '1985-06-05','2001-07-05','2008-11-19','2000-10-13','2002-04-18','1987-08-17','1977-04-09','1984-03-22','1994-08-08','2005-07-09','1982-05-03','2016-01-30','2019-03-10','1981-03-23','1979-07-21','2023-01-14','2018-06-23','1995-08-27','2020-11-08','2014-02-17','1977-09-08']
s = {"ID": id, "fact": fact, "stamp": stamp}
data = pd.DataFrame(data = s)
data.sort_values(by = "stamp", inplace= True)

How the df looks like:

Pandas itertuples – 根据事件在矩阵中填充数值

Expected output:

Pandas itertuples – 根据事件在矩阵中填充数值

The code I have so far:

facts = data.fact.unique()
structure =  {'ID': [], 'stamp':[], 'fact': [], 'AC':[], 'CD':[], 'IC':[], 'CC':[]}

for row in data.itertuples():
    structure["ID"].append(getattr(row, 'ID'))
    structure["stamp"].append(getattr(row, 'stamp'))
    structure["fact"].append(getattr(row, 'fact'))

    for fact in facts:
           if getattr(row, 'fact') == fact:
               structure[fact].append(getattr(row, 'stamp'))   
           else:
               structure[fact].append('na') 

Produces:

Pandas itertuples – 根据事件在矩阵中填充数值

which is incorrect. Any help is appreciated and thank you in advance.

答案1

得分: 1

使用 [`merge_asof`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html) 函数使用 `allow_exact_matches` 参数避免首先匹配相同的 `on`然后使用 [`DataFrame.pivot_table`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot_table.html) 函数进行数据透视使用 `aggfunc='first'`,最后通过 [`DataFrame.join`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html) 将结果添加到原始 DataFrame 中

```python
data['stamp'] = pd.to_datetime(data['stamp'])

df1 = data.sort_values('stamp')
df = pd.merge_asof(df1.rename(columns={'stamp':'stamp1'}), 
                    df1, 
                    left_on='stamp1', 
                    right_on='stamp', 
                    allow_exact_matches=False, 
                    by=['ID'],
                    direction='forward',
                    suffixes=('_','')).drop(['stamp1','fact_'],axis=1)

df1 = data.join(df.pivot_table(index='ID', 
                              columns='fact', 
                              values='stamp',
                              aggfunc='first'), on=['ID'])
英文:

Use merge_asof with allow_exact_matches parameter for avoid match the same on value first, then pivoting by DataFrame.pivot_table with aggfunc='first' and append to original DataFrame by DataFrame.join:

data['stamp'] = pd.to_datetime(data['stamp'])

df1 = data.sort_values('stamp')
df = pd.merge_asof(df1.rename(columns={'stamp':'stamp1'}), 
                    df1, 
                    left_on='stamp1', 
                    right_on='stamp', 
                    allow_exact_matches=False, 
                    by=['ID'],
                    direction='forward',
                    suffixes=('_','')).drop(['stamp1','fact_'],axis=1)

df1 = data.join(df.pivot_table(index='ID', 
                              columns='fact', 
                              values='stamp',
                              aggfunc='first'), on=['ID'])

print (df1)
    ID fact      stamp         AC         CC         CD         IC
1    1   AC 1973-11-06 1986-01-24        NaT        NaT 1979-02-22
13   4   CD 1977-04-09        NaT        NaT        NaT        NaT
27  10   IC 1977-09-08 2014-02-17        NaT        NaT 2020-11-08
0    1   IC 1979-02-22 1986-01-24        NaT        NaT 1979-02-22
21   8   CD 1979-07-21 2019-03-10 1981-03-23        NaT        NaT
20   8   CC 1981-03-23 2019-03-10 1981-03-23        NaT        NaT
17   7   CD 1982-05-03        NaT 2005-07-09        NaT 2016-01-30
14   5   IC 1984-03-22        NaT        NaT        NaT        NaT
7    2   CD 1985-06-05 2009-01-18 1992-01-14        NaT 2001-07-05
3    1   AC 1986-01-24 1986-01-24        NaT        NaT 1979-02-22
2    1   IC 1986-03-12 1986-01-24        NaT        NaT 1979-02-22
12   3   AC 1987-08-17        NaT 2008-11-19 2000-10-13 2002-04-18
6    2   CC 1992-01-14 2009-01-18 1992-01-14        NaT 2001-07-05
15   6   AC 1994-08-08        NaT        NaT        NaT        NaT
24   9   IC 1995-08-27 2018-06-23        NaT        NaT 2023-01-14
10   3   CD 2000-10-13        NaT 2008-11-19 2000-10-13 2002-04-18
8    2   IC 2001-07-05 2009-01-18 1992-01-14        NaT 2001-07-05
11   3   IC 2002-04-18        NaT 2008-11-19 2000-10-13 2002-04-18
16   7   CC 2005-07-09        NaT 2005-07-09        NaT 2016-01-30
9    3   CC 2008-11-19        NaT 2008-11-19 2000-10-13 2002-04-18
5    2   AC 2009-01-18 2009-01-18 1992-01-14        NaT 2001-07-05
4    1   IC 2012-05-22 1986-01-24        NaT        NaT 1979-02-22
26  10   AC 2014-02-17 2014-02-17        NaT        NaT 2020-11-08
18   7   IC 2016-01-30        NaT 2005-07-09        NaT 2016-01-30
23   9   AC 2018-06-23 2018-06-23        NaT        NaT 2023-01-14
19   8   AC 2019-03-10 2019-03-10 1981-03-23        NaT        NaT
25  10   IC 2020-11-08 2014-02-17        NaT        NaT 2020-11-08
22   9   IC 2023-01-14 2018-06-23        NaT        NaT 2023-01-14

huangapple
  • 本文由 发表于 2023年3月9日 18:56:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75683641.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定