2023年5月25日 02:34:15go评论98阅读模式

英文:

Pandas speed calculation based on travelled distance and time

问题

我有以下的数据框：

data = [
    ['ID', '2022-04-23T03:36:26Z', 60, 10, 83],
    ['ID', '2022-04-23T03:37:30Z', Nan, Nan, Nan],
    ['ID', '2022-04-23T03:37:48Z', Nan, Nan, Nan],
    ['ID', '2022-04-23T03:38:24Z', 61, 11, 72],
    ['ID', '2022-04-23T03:44:20Z', 63, 13, 75],
    ['ID', '2022-04-23T03:45:02Z', Nan, Nan, Nan],
    ['ID', '2022-04-23T03:45:06Z', Nan, Nan, Nan],
    ['ID', '2022-04-23T03:45:08Z', Nan, Nan, Nan],
    ['ID', '2022-04-23T03:45:12Z', Nan, Nan, Nan],
    ['ID', '2022-04-23T03:45:48Z', 69, 15, 61]
]
df = pd.DataFrame(data=data,
                  columns=['ID', 'time', 'latitude', 'longitude', 'speed'])

问题是对于某些行，我只有时间值，例如第2行和第3行。对于这些行，我想根据前一行（第1行）和后一行（第4行）的时间、纬度和经度计算平均速度。

例如，第2行和第3行的速度值应该是基于行驶距离（可能使用Haversine公式）除以总时间（'2022-04-23T03:38:24Z' - '2022-04-23T03:36:26Z'）的平均速度值。

你可以如何用Python编写这个操作？

英文:

I have the following dataframe:

data = [
    [ID, &#39;2022-04-23T03:36:26Z&#39;, 60, 10, 83],
    [ID, &#39;2022-04-23T03:37:30Z&#39;, Nan, Nan, Nan],
    [ID, &#39;2022-04-23T03:37:48Z&#39;, Nan, Nan, Nan],
    [ID, &#39;2022-04-23T03:38:24Z&#39;, 61, 11, 72],
    [ID, &#39;2022-04-23T03:44:20Z&#39;, 63, 13, 75],
    [ID, &#39;2022-04-23T03:45:02Z&#39;, Nan, Nan, Nan],
    [ID, &#39;2022-04-23T03:45:06Z&#39;, Nan, Nan, Nan],
    [ID, &#39;2022-04-23T03:45:08Z&#39;, Nan, Nan, Nan],
    [ID, &#39;2022-04-23T03:45:12Z&#39;, Nan, Nan, Nan],
    [ID, &#39;2022-04-23T03:45:48Z&#39;, 69, 15, 61]
]
df = pd.DataFrame(data=data,
                  columns=[&#39;ID&#39;, &#39;time&#39;, &#39;latitude&#39;, &#39;longitude&#39;, &#39;speed&#39;)

The problem is that for some rows I have only the time value e.g. row 2 and 3. For these rows, I want to calculate the average speed based on time, latitude and longitude of the row preceding (row 1) and following (row 4) the Nan speed rows.

For example, the speed value in row 2 and 3 should be an average speed value which is based on the travelled distance (maybe using Haversine formula) divided by the total amount of time ('2022-04-23T03:38:24Z' - '2022-04-23T03:36:26Z').

How can I write this in Python?

答案1

得分: 1

pandas.DataFrame.interpolate 可能是您寻找的内容，如果您寻找的是一种简单的方法（如果您需要更具体的内容，可以查看文档中的其他选项）：

df[["latitude", "longitude", "speed"]] = df.interpolate() \
    [["latitude", "longitude", "speed"]].round().astype(int)

结果：

ID                  time  latitude  longitude  speed
0  ID  2022-04-23T03:36:26Z        60         10     83
1  ID  2022-04-23T03:37:30Z        60         10     79
2  ID  2022-04-23T03:37:48Z        60         10     75
3  ID  2022-04-23T03:38:24Z        61         11     72
4  ID  2022-04-23T03:44:20Z        63         13     75
5  ID  2022-04-23T03:45:02Z        64         13     72
6  ID  2022-04-23T03:45:06Z        65         13     69
7  ID  2022-04-23T03:45:08Z        66         14     66
8  ID  2022-04-23T03:45:12Z        67         14     63
9  ID  2022-04-23T03:45:48Z        69         15     61

英文:

pandas.DataFrame.interpolate may be what you're looking for if you're looking for a naive approach (there's other options if you're looking for something more specific just see the docs):

df[[&quot;latitude&quot;, &quot;longitude&quot;, &quot;speed&quot;]] = df.interpolate() \
    [[&quot;latitude&quot;, &quot;longitude&quot;, &quot;speed&quot;]].round().astype(int)

Result:

ID                  time  latitude  longitude  speed
0  ID  2022-04-23T03:36:26Z        60         10     83
1  ID  2022-04-23T03:37:30Z        60         10     79
2  ID  2022-04-23T03:37:48Z        60         10     75
3  ID  2022-04-23T03:38:24Z        61         11     72
4  ID  2022-04-23T03:44:20Z        63         13     75
5  ID  2022-04-23T03:45:02Z        64         13     72
6  ID  2022-04-23T03:45:06Z        65         13     69
7  ID  2022-04-23T03:45:08Z        66         14     66
8  ID  2022-04-23T03:45:12Z        67         14     63
9  ID  2022-04-23T03:45:48Z        69         15     61

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas基于行驶距离和时间的速度计算

问题

答案1

改变 Python 模块的名称并保持向后兼容性的最佳实践是什么？

在pandas中有条件地向列表的列表中追加值。

Is there a way to compare a value against multiple possibilities without repetitive if statements?

在cmd中，直到按下Enter键才停止循环。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。