如何递归迭代DataFrame并计算行的值

huangapple go评论64阅读模式
英文:

How to iterate recursevly a df and calculate row value

问题

对于特定的id,如果它们为0,我想这样计算开始和结束:

例如,如果id=3,我们检查是否开始为0,如果是这样,然后

开始 = 数据框中前一个id的结束
结束 = 开始 + 持续时间

但如果id=4,我们如何以简单的方式检查数据框的上面每一行并计算开始和结束?

英文:

I have the following df

list_columns = ['id','start', 'end', 'duration']
list_data = [
    [1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', 0, 2],[3,'0', 0, 3],[4,'0', 0, 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)

For a specific id, I want to calculate the start & end if they are 0 like this:

For example if id=3 we check if start is 0, if so, then

start = end of the previous id from the df
end = start  + duration

But if id=4, how can we check each above row of the df in a simple way and calculate the start & end ?

答案1

得分: 0

无需使用递归您需要将所有日期转换为Datetime格式这意味着首先用某个日期值替换'0'),然后用前一行或持续时间的日期填充空白日期可能有更简洁的方法但您可以使用以下代码

```python
import pandas as pd
import datetime

dummy = '1800-01-01'
dummy_date = pd.to_datetime(dummy)

list_columns = ['id','start', 'end', 'duration']
list_data = [
    [1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', '0', 2],[3,'0', '0', 3],[4,'0', '0', 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)

# 用虚拟值替换'0'日期,然后将所有日期转换为Datetime
df['start'] = df['start'].where(df['start'] != '0', dummy)
df['start'] = pd.to_datetime(df['start'], format = '%Y-%m-%d')
df['end'] = df['end'].where(df['end'] != '0', dummy)
df['end'] = pd.to_datetime(df['end'], format = '%Y-%m-%d')

for row in df.itertuples():
    if row.start == dummy_date:
        df.loc[row.Index, 'start'] = df.loc[row.Index -1, 'end']
    if row.end == dummy_date:
        df.loc[row.Index, 'end'] = df.loc[row.Index, 'start'] + pd.Timedelta(row.duration, unit = 'days')
        
print(df)

输出结果为:

   id      start        end  duration
0   1 2023-01-01 2023-04-02         0
1   2 2023-01-10 2023-01-12         2
2   3 2023-01-12 2023-01-15         3
3   4 2023-01-15 2023-01-19         4
英文:

No recursion is required. You need to convert all your dates to Datetime format (which means first replacing '0' with some date value) then filling in the blanks dates from the previous row or duration. There is probably a neater way but you can use:

import pandas as pd
import datetime

dummy = '1800-01-01'
dummy_date = pd.to_datetime(dummy)

list_columns = ['id','start', 'end', 'duration']
list_data = [
    [1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', '0', 2],[3,'0', '0', 3],[4,'0', '0', 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)

#replace '0' date with dummy value then convert all to datetime
df['start'] = df['start'].where(df['start'] != '0', dummy)
df['start'] = pd.to_datetime(df['start'], format = '%Y-%m-%d')
df['end'] = df['end'].where(df['end'] != '0', dummy)
df['end'] = pd.to_datetime(df['end'], format = '%Y-%m-%d')

for row in df.itertuples():
    if row.start == dummy_date:
        df.loc[row.Index, 'start'] = df.loc[row.Index -1, 'end']
    if row.end == dummy_date:
        df.loc[row.Index, 'end'] = df.loc[row.Index, 'start'] + pd.Timedelta(row.duration, unit = 'days')
        
print(df)

which gives:

   id      start        end  duration
0   1 2023-01-01 2023-04-02         0
1   2 2023-01-10 2023-01-12         2
2   3 2023-01-12 2023-01-15         3
3   4 2023-01-15 2023-01-19         4

huangapple
  • 本文由 发表于 2023年3月8日 18:14:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定