英文:
How to iterate recursevly a df and calculate row value
问题
对于特定的id,如果它们为0,我想这样计算开始和结束:
例如,如果id=3,我们检查是否开始为0,如果是这样,然后
开始 = 数据框中前一个id的结束
结束 = 开始 + 持续时间
但如果id=4,我们如何以简单的方式检查数据框的上面每一行并计算开始和结束?
英文:
I have the following df
list_columns = ['id','start', 'end', 'duration']
list_data = [
[1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', 0, 2],[3,'0', 0, 3],[4,'0', 0, 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)
For a specific id, I want to calculate the start & end if they are 0 like this:
For example if id=3 we check if start is 0, if so, then
start = end of the previous id from the df
end = start + duration
But if id=4, how can we check each above row of the df in a simple way and calculate the start & end ?
答案1
得分: 0
无需使用递归。您需要将所有日期转换为Datetime格式(这意味着首先用某个日期值替换'0'),然后用前一行或持续时间的日期填充空白日期。可能有更简洁的方法,但您可以使用以下代码:
```python
import pandas as pd
import datetime
dummy = '1800-01-01'
dummy_date = pd.to_datetime(dummy)
list_columns = ['id','start', 'end', 'duration']
list_data = [
[1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', '0', 2],[3,'0', '0', 3],[4,'0', '0', 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)
# 用虚拟值替换'0'日期,然后将所有日期转换为Datetime
df['start'] = df['start'].where(df['start'] != '0', dummy)
df['start'] = pd.to_datetime(df['start'], format = '%Y-%m-%d')
df['end'] = df['end'].where(df['end'] != '0', dummy)
df['end'] = pd.to_datetime(df['end'], format = '%Y-%m-%d')
for row in df.itertuples():
if row.start == dummy_date:
df.loc[row.Index, 'start'] = df.loc[row.Index -1, 'end']
if row.end == dummy_date:
df.loc[row.Index, 'end'] = df.loc[row.Index, 'start'] + pd.Timedelta(row.duration, unit = 'days')
print(df)
输出结果为:
id start end duration
0 1 2023-01-01 2023-04-02 0
1 2 2023-01-10 2023-01-12 2
2 3 2023-01-12 2023-01-15 3
3 4 2023-01-15 2023-01-19 4
英文:
No recursion is required. You need to convert all your dates to Datetime format (which means first replacing '0' with some date value) then filling in the blanks dates from the previous row or duration. There is probably a neater way but you can use:
import pandas as pd
import datetime
dummy = '1800-01-01'
dummy_date = pd.to_datetime(dummy)
list_columns = ['id','start', 'end', 'duration']
list_data = [
[1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', '0', 2],[3,'0', '0', 3],[4,'0', '0', 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)
#replace '0' date with dummy value then convert all to datetime
df['start'] = df['start'].where(df['start'] != '0', dummy)
df['start'] = pd.to_datetime(df['start'], format = '%Y-%m-%d')
df['end'] = df['end'].where(df['end'] != '0', dummy)
df['end'] = pd.to_datetime(df['end'], format = '%Y-%m-%d')
for row in df.itertuples():
if row.start == dummy_date:
df.loc[row.Index, 'start'] = df.loc[row.Index -1, 'end']
if row.end == dummy_date:
df.loc[row.Index, 'end'] = df.loc[row.Index, 'start'] + pd.Timedelta(row.duration, unit = 'days')
print(df)
which gives:
id start end duration
0 1 2023-01-01 2023-04-02 0
1 2 2023-01-10 2023-01-12 2
2 3 2023-01-12 2023-01-15 3
3 4 2023-01-15 2023-01-19 4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论