英文:
How to add a column in a dataset based on a certain criteria
问题
我需要执行以下操作:我有一个包含特定车辆在特定点通过的时间的数据集。我需要插入一列,指示每辆特定车辆经过该点的次数。此外,我需要在同一辆车的连续通过之间的时间间隔超过某个阈值时重置计数。
例如:
车辆 || 时间 || 通过次数
A 00:15 1
B 00:20 1
C 00:25 1
C 00:45 2
A 00:59 2
A 01:56 3
B 22:55 1 (时间间隔超过阈值,所以重置)
A 23:49 1 (时间间隔超过阈值,所以重置)
df['period'] = pd.to_datetime(df['date_time'])
dfM['Number'] = df.groupby(['Vehicle']).cumcount().add(1)
我认为这只是总结了通过的次数,而没有考虑在某个阈值以上重置,对于这部分我完全不知道如何做。
英文:
I have the need to do the following: I have a dataset containing the time at which a certain specific vehicle passes at a specific point. I need to insert a column indicating how many times each specific vehicle passes there. Moreover, I need to reset the count each time the delta time between two subsequent passes of the same vehicle is over a certain threshold.
For example:
Vehicle || Time || number times passed
A 00:15 1
B 00:20 1
C 00:25 1
C 00:45 2
A 00:59 2
A 01:56 3
B 22:55 1 (delta time above the threshold, so reset)
A 23:49 1 (delta time above the threshold, so reset)
df['period']=pd.to_datetime(df['date_time'])
dfM['Number'] = df.groupby(['Vehicle']).cumcount().add(1)
I think this just summes up the times without considering the reset above a certain threshold, for which I have absolutely no idea how to do it.
答案1
得分: 0
# 将df简单分成几个部分,然后分别计算每个部分的结果
df['epoch'] = (
pd.to_datetime(df['Time']).diff() > \
pd.Timedelta('01:00:00') # 你的阈值
).cumsum()
# 从你的代码
def get_cumcount(df):
return df.groupby('Vehicle').cumcount().add(1).values
# 对于每个epoch:
# 分别计算结果
df.loc[:, 'result'] = None
for i in df['epoch'].unique():
cumcount = get_cumcount(df[df['epoch'] == i])
df.loc[df['epoch'] == i, 'result'] = cumcount
英文:
My first idea is to simply split df into parts and then compute the result for each part separately
This is not perfect, but looks like it works:
# add "epoch" for calculations
# for each epoch we will compute result separately
# epoch = how many timediffs were more than thresholds (so far)
df['epoch'] = (
pd.to_datetime(df['Time']).diff() > \
pd.Timedelta('01:00:00') # your threshold
).cumsum()
# from your code
def get_cumcount(df):
return df.groupby('Vehicle').cumcount().add(1).values
# for each epoch:
# compute result separately
df.loc[:, 'result'] = None
for i in df['epoch'].unique():
cumcount = get_cumcount(df[df['epoch'] == i])
df.loc[df['epoch'] == i, 'result'] = cumcount
I also tried doing it using groupby
and transform
, but got errors
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论