2023年3月9日 18:56:23go评论139阅读模式

英文:

Pandas itertuples - fill in a matrix with a value based on an event

问题

import pandas as pd
id = [1,1,1,1,1,2,2,2,2,3,3,3,3,4,5,6,7,7,7,8,8,8,9,9,9,10,10,10]
fact = ["IC", "AC","IC","AC","IC","AC", "CC", "CD","IC","CC", "CD","IC","AC", "CD","IC","AC", "CC", "CD","IC","AC", "CC", "CD","IC","AC","IC","IC","AC","IC"]
stamp = ['1979-02-22','1973-11-06','1986-03-12','1986-01-24', '2012-05-22', '2009-01-18', '1992-01-14', '1985-06-05','2001-07-05','2008-11-19','2000-10-13','2002-04-18','1987-08-17','1977-04-09','1984-03-22','1994-08-08','2005-07-09','1982-05-03','2016-01-30','2019-03-10','1981-03-23','1979-07-21','2023-01-14','2018-06-23','1995-08-27','2020-11-08','2014-02-17','1977-09-08']
s = {"ID": id, "fact": fact, "stamp": stamp}
data = pd.DataFrame(data = s)
data.sort_values(by = "stamp", inplace= True)
facts = data.fact.unique()
structure = {'ID': [], 'stamp':[], 'fact': [], 'AC':[], 'CD':[], 'IC':[], 'CC':[]}
for row in data.itertuples():
    structure["ID"].append(getattr(row, 'ID'))
    structure["stamp"].append(getattr(row, 'stamp'))
    structure["fact"].append(getattr(row, 'fact'))
    for fact in facts:
        if getattr(row, 'fact') == fact:
            structure[fact].append(getattr(row, 'stamp'))
        else:
            structure[fact].append('na')

英文:

I am trying to create a matrix in which I fill in the date of the first occurence of an event per row after the specified stamp date in the said row.

Sample dataframe:

id =  [1,1,1,1,1,2,2,2,2,3,3,3,3,4,5,6,7,7,7,8,8,8,9,9,9,10,10,10]
fact = [&quot;IC&quot;, &quot;AC&quot;,&quot;IC&quot;,&quot;AC&quot;,&quot;IC&quot;,&quot;AC&quot;, &quot;CC&quot;, &quot;CD&quot;,&quot;IC&quot;,&quot;CC&quot;, &quot;CD&quot;,&quot;IC&quot;,&quot;AC&quot;, &quot;CD&quot;,&quot;IC&quot;,&quot;AC&quot;, &quot;CC&quot;, &quot;CD&quot;,&quot;IC&quot;,&quot;AC&quot;, &quot;CC&quot;, &quot;CD&quot;,&quot;IC&quot;,&quot;AC&quot;,&quot;IC&quot;,&quot;IC&quot;,&quot;AC&quot;,&quot;IC&quot;]
stamp = [&#39;1979-02-22&#39;,&#39;1973-11-06&#39;,&#39;1986-03-12&#39;,&#39;1986-01-24&#39;, &#39;2012-05-22&#39;, &#39;2009-01-18&#39;, &#39;1992-01-14&#39;, &#39;1985-06-05&#39;,&#39;2001-07-05&#39;,&#39;2008-11-19&#39;,&#39;2000-10-13&#39;,&#39;2002-04-18&#39;,&#39;1987-08-17&#39;,&#39;1977-04-09&#39;,&#39;1984-03-22&#39;,&#39;1994-08-08&#39;,&#39;2005-07-09&#39;,&#39;1982-05-03&#39;,&#39;2016-01-30&#39;,&#39;2019-03-10&#39;,&#39;1981-03-23&#39;,&#39;1979-07-21&#39;,&#39;2023-01-14&#39;,&#39;2018-06-23&#39;,&#39;1995-08-27&#39;,&#39;2020-11-08&#39;,&#39;2014-02-17&#39;,&#39;1977-09-08&#39;]
s = {&quot;ID&quot;: id, &quot;fact&quot;: fact, &quot;stamp&quot;: stamp}
data = pd.DataFrame(data = s)
data.sort_values(by = &quot;stamp&quot;, inplace= True)

How the df looks like:

Expected output:

The code I have so far:

facts = data.fact.unique()
structure =  {&#39;ID&#39;: [], &#39;stamp&#39;:[], &#39;fact&#39;: [], &#39;AC&#39;:[], &#39;CD&#39;:[], &#39;IC&#39;:[], &#39;CC&#39;:[]}
for row in data.itertuples():
    structure[&quot;ID&quot;].append(getattr(row, &#39;ID&#39;))
    structure[&quot;stamp&quot;].append(getattr(row, &#39;stamp&#39;))
    structure[&quot;fact&quot;].append(getattr(row, &#39;fact&#39;))
    for fact in facts:
           if getattr(row, &#39;fact&#39;) == fact:
               structure[fact].append(getattr(row, &#39;stamp&#39;))   
           else:
               structure[fact].append(&#39;na&#39;)

Produces:

which is incorrect. Any help is appreciated and thank you in advance.

答案1

得分: 1

使用 [`merge_asof`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html) 函数，使用 `allow_exact_matches` 参数避免首先匹配相同的 `on` 值，然后使用 [`DataFrame.pivot_table`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot_table.html) 函数进行数据透视，使用 `aggfunc='first'`，最后通过 [`DataFrame.join`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html) 将结果添加到原始 DataFrame 中：
```python
data['stamp'] = pd.to_datetime(data['stamp'])
df1 = data.sort_values('stamp')
df = pd.merge_asof(df1.rename(columns={'stamp':'stamp1'}), 
                    df1, 
                    left_on='stamp1', 
                    right_on='stamp', 
                    allow_exact_matches=False, 
                    by=['ID'],
                    direction='forward',
                    suffixes=('_','')).drop(['stamp1','fact_'],axis=1)
df1 = data.join(df.pivot_table(index='ID', 
                              columns='fact', 
                              values='stamp',
                              aggfunc='first'), on=['ID'])

英文:

Use merge_asof with allow_exact_matches parameter for avoid match the same on value first, then pivoting by DataFrame.pivot_table with aggfunc='first' and append to original DataFrame by DataFrame.join:

data[&#39;stamp&#39;] = pd.to_datetime(data[&#39;stamp&#39;])
df1 = data.sort_values(&#39;stamp&#39;)
df = pd.merge_asof(df1.rename(columns={&#39;stamp&#39;:&#39;stamp1&#39;}), 
                    df1, 
                    left_on=&#39;stamp1&#39;, 
                    right_on=&#39;stamp&#39;, 
                    allow_exact_matches=False, 
                    by=[&#39;ID&#39;],
                    direction=&#39;forward&#39;,
                    suffixes=(&#39;_&#39;,&#39;&#39;)).drop([&#39;stamp1&#39;,&#39;fact_&#39;],axis=1)
df1 = data.join(df.pivot_table(index=&#39;ID&#39;, 
                              columns=&#39;fact&#39;, 
                              values=&#39;stamp&#39;,
                              aggfunc=&#39;first&#39;), on=[&#39;ID&#39;])

print (df1)
    ID fact      stamp         AC         CC         CD         IC
1    1   AC 1973-11-06 1986-01-24        NaT        NaT 1979-02-22
13   4   CD 1977-04-09        NaT        NaT        NaT        NaT
27  10   IC 1977-09-08 2014-02-17        NaT        NaT 2020-11-08
0    1   IC 1979-02-22 1986-01-24        NaT        NaT 1979-02-22
21   8   CD 1979-07-21 2019-03-10 1981-03-23        NaT        NaT
20   8   CC 1981-03-23 2019-03-10 1981-03-23        NaT        NaT
17   7   CD 1982-05-03        NaT 2005-07-09        NaT 2016-01-30
14   5   IC 1984-03-22        NaT        NaT        NaT        NaT
7    2   CD 1985-06-05 2009-01-18 1992-01-14        NaT 2001-07-05
3    1   AC 1986-01-24 1986-01-24        NaT        NaT 1979-02-22
2    1   IC 1986-03-12 1986-01-24        NaT        NaT 1979-02-22
12   3   AC 1987-08-17        NaT 2008-11-19 2000-10-13 2002-04-18
6    2   CC 1992-01-14 2009-01-18 1992-01-14        NaT 2001-07-05
15   6   AC 1994-08-08        NaT        NaT        NaT        NaT
24   9   IC 1995-08-27 2018-06-23        NaT        NaT 2023-01-14
10   3   CD 2000-10-13        NaT 2008-11-19 2000-10-13 2002-04-18
8    2   IC 2001-07-05 2009-01-18 1992-01-14        NaT 2001-07-05
11   3   IC 2002-04-18        NaT 2008-11-19 2000-10-13 2002-04-18
16   7   CC 2005-07-09        NaT 2005-07-09        NaT 2016-01-30
9    3   CC 2008-11-19        NaT 2008-11-19 2000-10-13 2002-04-18
5    2   AC 2009-01-18 2009-01-18 1992-01-14        NaT 2001-07-05
4    1   IC 2012-05-22 1986-01-24        NaT        NaT 1979-02-22
26  10   AC 2014-02-17 2014-02-17        NaT        NaT 2020-11-08
18   7   IC 2016-01-30        NaT 2005-07-09        NaT 2016-01-30
23   9   AC 2018-06-23 2018-06-23        NaT        NaT 2023-01-14
19   8   AC 2019-03-10 2019-03-10 1981-03-23        NaT        NaT
25  10   IC 2020-11-08 2014-02-17        NaT        NaT 2020-11-08
22   9   IC 2023-01-14 2018-06-23        NaT        NaT 2023-01-14

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas itertuples – 根据事件在矩阵中填充数值

问题

答案1

位非运算符不会翻转位。

Python软件包名称重复

无法使用Selenium Manager获取chromedriver。

VSCode IntelliSense认为存在一个Python的’function()’类。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。