2023年5月13日 15:52:30go评论87阅读模式

英文:

How to add a column in a dataset based on a certain criteria

问题

我需要执行以下操作：我有一个包含特定车辆在特定点通过的时间的数据集。我需要插入一列，指示每辆特定车辆经过该点的次数。此外，我需要在同一辆车的连续通过之间的时间间隔超过某个阈值时重置计数。

例如：

车辆 || 时间 || 通过次数
A         00:15      1
B         00:20      1
C         00:25      1
C         00:45      2
A         00:59      2
A         01:56      3
B         22:55      1   （时间间隔超过阈值，所以重置）
A         23:49      1   （时间间隔超过阈值，所以重置）

df['period'] = pd.to_datetime(df['date_time'])
dfM['Number'] = df.groupby(['Vehicle']).cumcount().add(1)

我认为这只是总结了通过的次数，而没有考虑在某个阈值以上重置，对于这部分我完全不知道如何做。

英文:

I have the need to do the following: I have a dataset containing the time at which a certain specific vehicle passes at a specific point. I need to insert a column indicating how many times each specific vehicle passes there. Moreover, I need to reset the count each time the delta time between two subsequent passes of the same vehicle is over a certain threshold.

For example:

Vehicle || Time || number times passed
A         00:15      1
B         00:20      1
C         00:25      1
C         00:45      2
A         00:59      2
A         01:56      3
B         22:55      1   (delta time above the threshold, so reset)
A         23:49      1   (delta time above the threshold, so reset)

df[&#39;period&#39;]=pd.to_datetime(df[&#39;date_time&#39;])
dfM[&#39;Number&#39;] = df.groupby([&#39;Vehicle&#39;]).cumcount().add(1)

I think this just summes up the times without considering the reset above a certain threshold, for which I have absolutely no idea how to do it.

答案1

得分: 0

# 将df简单分成几个部分，然后分别计算每个部分的结果
df[&#39;epoch&#39;] = (
    pd.to_datetime(df[&#39;Time&#39;]).diff() &gt; \
    pd.Timedelta(&#39;01:00:00&#39;)  # 你的阈值
).cumsum()
# 从你的代码
def get_cumcount(df):
    return df.groupby(&#39;Vehicle&#39;).cumcount().add(1).values
# 对于每个epoch：
# 分别计算结果
df.loc[:, &#39;result&#39;] = None
for i in df[&#39;epoch&#39;].unique():
    cumcount = get_cumcount(df[df[&#39;epoch&#39;] == i])
    df.loc[df[&#39;epoch&#39;] == i, &#39;result&#39;] = cumcount

英文:

My first idea is to simply split df into parts and then compute the result for each part separately

This is not perfect, but looks like it works:

# add &quot;epoch&quot; for calculations
# for each epoch we will compute result separately
# epoch = how many timediffs were more than thresholds (so far)
df[&#39;epoch&#39;] = (
    pd.to_datetime(df[&#39;Time&#39;]).diff() &gt; \
    pd.Timedelta(&#39;01:00:00&#39;)  # your threshold
).cumsum()
# from your code
def get_cumcount(df):
    return df.groupby(&#39;Vehicle&#39;).cumcount().add(1).values
# for each epoch:
# compute result separately
df.loc[:, &#39;result&#39;] = None
for i in df[&#39;epoch&#39;].unique():
    cumcount = get_cumcount(df[df[&#39;epoch&#39;] == i])
    df.loc[df[&#39;epoch&#39;] == i, &#39;result&#39;] = cumcount

I also tried doing it using groupby and transform, but got errors

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在基于特定条件的数据集中如何添加一列

问题

答案1

Python正则表达式的正向先行断言无法正确分割。

保持纵横比的同时调整图像大小

如何使用pyvespa将应用程序部署到远程目标的Vespa？

TypeError: ‘Div’ object is not callable

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。