Pandas itertuples – 根据事件在矩阵中填充数值

huangapple go评论139阅读模式
英文:

Pandas itertuples - fill in a matrix with a value based on an event

问题

  1. import pandas as pd
  2. id = [1,1,1,1,1,2,2,2,2,3,3,3,3,4,5,6,7,7,7,8,8,8,9,9,9,10,10,10]
  3. fact = ["IC", "AC","IC","AC","IC","AC", "CC", "CD","IC","CC", "CD","IC","AC", "CD","IC","AC", "CC", "CD","IC","AC", "CC", "CD","IC","AC","IC","IC","AC","IC"]
  4. stamp = ['1979-02-22','1973-11-06','1986-03-12','1986-01-24', '2012-05-22', '2009-01-18', '1992-01-14', '1985-06-05','2001-07-05','2008-11-19','2000-10-13','2002-04-18','1987-08-17','1977-04-09','1984-03-22','1994-08-08','2005-07-09','1982-05-03','2016-01-30','2019-03-10','1981-03-23','1979-07-21','2023-01-14','2018-06-23','1995-08-27','2020-11-08','2014-02-17','1977-09-08']
  5. s = {"ID": id, "fact": fact, "stamp": stamp}
  6. data = pd.DataFrame(data = s)
  7. data.sort_values(by = "stamp", inplace= True)
  8. facts = data.fact.unique()
  9. structure = {'ID': [], 'stamp':[], 'fact': [], 'AC':[], 'CD':[], 'IC':[], 'CC':[]}
  10. for row in data.itertuples():
  11. structure["ID"].append(getattr(row, 'ID'))
  12. structure["stamp"].append(getattr(row, 'stamp'))
  13. structure["fact"].append(getattr(row, 'fact'))
  14. for fact in facts:
  15. if getattr(row, 'fact') == fact:
  16. structure[fact].append(getattr(row, 'stamp'))
  17. else:
  18. structure[fact].append('na')
英文:

I am trying to create a matrix in which I fill in the date of the first occurence of an event per row after the specified stamp date in the said row.

Sample dataframe:

  1. id = [1,1,1,1,1,2,2,2,2,3,3,3,3,4,5,6,7,7,7,8,8,8,9,9,9,10,10,10]
  2. fact = ["IC", "AC","IC","AC","IC","AC", "CC", "CD","IC","CC", "CD","IC","AC", "CD","IC","AC", "CC", "CD","IC","AC", "CC", "CD","IC","AC","IC","IC","AC","IC"]
  3. stamp = ['1979-02-22','1973-11-06','1986-03-12','1986-01-24', '2012-05-22', '2009-01-18', '1992-01-14', '1985-06-05','2001-07-05','2008-11-19','2000-10-13','2002-04-18','1987-08-17','1977-04-09','1984-03-22','1994-08-08','2005-07-09','1982-05-03','2016-01-30','2019-03-10','1981-03-23','1979-07-21','2023-01-14','2018-06-23','1995-08-27','2020-11-08','2014-02-17','1977-09-08']
  4. s = {"ID": id, "fact": fact, "stamp": stamp}
  5. data = pd.DataFrame(data = s)
  6. data.sort_values(by = "stamp", inplace= True)

How the df looks like:

Pandas itertuples – 根据事件在矩阵中填充数值

Expected output:

Pandas itertuples – 根据事件在矩阵中填充数值

The code I have so far:

  1. facts = data.fact.unique()
  2. structure = {'ID': [], 'stamp':[], 'fact': [], 'AC':[], 'CD':[], 'IC':[], 'CC':[]}
  3. for row in data.itertuples():
  4. structure["ID"].append(getattr(row, 'ID'))
  5. structure["stamp"].append(getattr(row, 'stamp'))
  6. structure["fact"].append(getattr(row, 'fact'))
  7. for fact in facts:
  8. if getattr(row, 'fact') == fact:
  9. structure[fact].append(getattr(row, 'stamp'))
  10. else:
  11. structure[fact].append('na')

Produces:

Pandas itertuples – 根据事件在矩阵中填充数值

which is incorrect. Any help is appreciated and thank you in advance.

答案1

得分: 1

  1. 使用 [`merge_asof`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html) 函数使用 `allow_exact_matches` 参数避免首先匹配相同的 `on`然后使用 [`DataFrame.pivot_table`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot_table.html) 函数进行数据透视使用 `aggfunc='first'`,最后通过 [`DataFrame.join`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html) 将结果添加到原始 DataFrame 中
  2. ```python
  3. data['stamp'] = pd.to_datetime(data['stamp'])
  4. df1 = data.sort_values('stamp')
  5. df = pd.merge_asof(df1.rename(columns={'stamp':'stamp1'}),
  6. df1,
  7. left_on='stamp1',
  8. right_on='stamp',
  9. allow_exact_matches=False,
  10. by=['ID'],
  11. direction='forward',
  12. suffixes=('_','')).drop(['stamp1','fact_'],axis=1)
  13. df1 = data.join(df.pivot_table(index='ID',
  14. columns='fact',
  15. values='stamp',
  16. aggfunc='first'), on=['ID'])
英文:

Use merge_asof with allow_exact_matches parameter for avoid match the same on value first, then pivoting by DataFrame.pivot_table with aggfunc='first' and append to original DataFrame by DataFrame.join:

  1. data['stamp'] = pd.to_datetime(data['stamp'])
  2. df1 = data.sort_values('stamp')
  3. df = pd.merge_asof(df1.rename(columns={'stamp':'stamp1'}),
  4. df1,
  5. left_on='stamp1',
  6. right_on='stamp',
  7. allow_exact_matches=False,
  8. by=['ID'],
  9. direction='forward',
  10. suffixes=('_','')).drop(['stamp1','fact_'],axis=1)
  11. df1 = data.join(df.pivot_table(index='ID',
  12. columns='fact',
  13. values='stamp',
  14. aggfunc='first'), on=['ID'])

  1. print (df1)
  2. ID fact stamp AC CC CD IC
  3. 1 1 AC 1973-11-06 1986-01-24 NaT NaT 1979-02-22
  4. 13 4 CD 1977-04-09 NaT NaT NaT NaT
  5. 27 10 IC 1977-09-08 2014-02-17 NaT NaT 2020-11-08
  6. 0 1 IC 1979-02-22 1986-01-24 NaT NaT 1979-02-22
  7. 21 8 CD 1979-07-21 2019-03-10 1981-03-23 NaT NaT
  8. 20 8 CC 1981-03-23 2019-03-10 1981-03-23 NaT NaT
  9. 17 7 CD 1982-05-03 NaT 2005-07-09 NaT 2016-01-30
  10. 14 5 IC 1984-03-22 NaT NaT NaT NaT
  11. 7 2 CD 1985-06-05 2009-01-18 1992-01-14 NaT 2001-07-05
  12. 3 1 AC 1986-01-24 1986-01-24 NaT NaT 1979-02-22
  13. 2 1 IC 1986-03-12 1986-01-24 NaT NaT 1979-02-22
  14. 12 3 AC 1987-08-17 NaT 2008-11-19 2000-10-13 2002-04-18
  15. 6 2 CC 1992-01-14 2009-01-18 1992-01-14 NaT 2001-07-05
  16. 15 6 AC 1994-08-08 NaT NaT NaT NaT
  17. 24 9 IC 1995-08-27 2018-06-23 NaT NaT 2023-01-14
  18. 10 3 CD 2000-10-13 NaT 2008-11-19 2000-10-13 2002-04-18
  19. 8 2 IC 2001-07-05 2009-01-18 1992-01-14 NaT 2001-07-05
  20. 11 3 IC 2002-04-18 NaT 2008-11-19 2000-10-13 2002-04-18
  21. 16 7 CC 2005-07-09 NaT 2005-07-09 NaT 2016-01-30
  22. 9 3 CC 2008-11-19 NaT 2008-11-19 2000-10-13 2002-04-18
  23. 5 2 AC 2009-01-18 2009-01-18 1992-01-14 NaT 2001-07-05
  24. 4 1 IC 2012-05-22 1986-01-24 NaT NaT 1979-02-22
  25. 26 10 AC 2014-02-17 2014-02-17 NaT NaT 2020-11-08
  26. 18 7 IC 2016-01-30 NaT 2005-07-09 NaT 2016-01-30
  27. 23 9 AC 2018-06-23 2018-06-23 NaT NaT 2023-01-14
  28. 19 8 AC 2019-03-10 2019-03-10 1981-03-23 NaT NaT
  29. 25 10 IC 2020-11-08 2014-02-17 NaT NaT 2020-11-08
  30. 22 9 IC 2023-01-14 2018-06-23 NaT NaT 2023-01-14

huangapple
  • 本文由 发表于 2023年3月9日 18:56:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75683641.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定