英文:
Pandas speed calculation based on travelled distance and time
问题
我有以下的数据框:
data = [
['ID', '2022-04-23T03:36:26Z', 60, 10, 83],
['ID', '2022-04-23T03:37:30Z', Nan, Nan, Nan],
['ID', '2022-04-23T03:37:48Z', Nan, Nan, Nan],
['ID', '2022-04-23T03:38:24Z', 61, 11, 72],
['ID', '2022-04-23T03:44:20Z', 63, 13, 75],
['ID', '2022-04-23T03:45:02Z', Nan, Nan, Nan],
['ID', '2022-04-23T03:45:06Z', Nan, Nan, Nan],
['ID', '2022-04-23T03:45:08Z', Nan, Nan, Nan],
['ID', '2022-04-23T03:45:12Z', Nan, Nan, Nan],
['ID', '2022-04-23T03:45:48Z', 69, 15, 61]
]
df = pd.DataFrame(data=data,
columns=['ID', 'time', 'latitude', 'longitude', 'speed'])
问题是对于某些行,我只有时间值,例如第2行和第3行。对于这些行,我想根据前一行(第1行)和后一行(第4行)的时间、纬度和经度计算平均速度。
例如,第2行和第3行的速度值应该是基于行驶距离(可能使用Haversine公式)除以总时间('2022-04-23T03:38:24Z' - '2022-04-23T03:36:26Z')的平均速度值。
你可以如何用Python编写这个操作?
英文:
I have the following dataframe:
data = [
[ID, '2022-04-23T03:36:26Z', 60, 10, 83],
[ID, '2022-04-23T03:37:30Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:37:48Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:38:24Z', 61, 11, 72],
[ID, '2022-04-23T03:44:20Z', 63, 13, 75],
[ID, '2022-04-23T03:45:02Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:45:06Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:45:08Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:45:12Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:45:48Z', 69, 15, 61]
]
df = pd.DataFrame(data=data,
columns=['ID', 'time', 'latitude', 'longitude', 'speed')
The problem is that for some rows I have only the time value e.g. row 2 and 3. For these rows, I want to calculate the average speed based on time, latitude and longitude of the row preceding (row 1) and following (row 4) the Nan speed rows.
For example, the speed value in row 2 and 3 should be an average speed value which is based on the travelled distance (maybe using Haversine formula) divided by the total amount of time ('2022-04-23T03:38:24Z' - '2022-04-23T03:36:26Z').
How can I write this in Python?
答案1
得分: 1
pandas.DataFrame.interpolate
可能是您寻找的内容,如果您寻找的是一种简单的方法(如果您需要更具体的内容,可以查看文档中的其他选项):
df[["latitude", "longitude", "speed"]] = df.interpolate() \
[["latitude", "longitude", "speed"]].round().astype(int)
结果:
ID time latitude longitude speed
0 ID 2022-04-23T03:36:26Z 60 10 83
1 ID 2022-04-23T03:37:30Z 60 10 79
2 ID 2022-04-23T03:37:48Z 60 10 75
3 ID 2022-04-23T03:38:24Z 61 11 72
4 ID 2022-04-23T03:44:20Z 63 13 75
5 ID 2022-04-23T03:45:02Z 64 13 72
6 ID 2022-04-23T03:45:06Z 65 13 69
7 ID 2022-04-23T03:45:08Z 66 14 66
8 ID 2022-04-23T03:45:12Z 67 14 63
9 ID 2022-04-23T03:45:48Z 69 15 61
英文:
pandas.DataFrame.interpolate
may be what you're looking for if you're looking for a naive approach (there's other options if you're looking for something more specific just see the docs):
df[["latitude", "longitude", "speed"]] = df.interpolate() \
[["latitude", "longitude", "speed"]].round().astype(int)
Result:
ID time latitude longitude speed
0 ID 2022-04-23T03:36:26Z 60 10 83
1 ID 2022-04-23T03:37:30Z 60 10 79
2 ID 2022-04-23T03:37:48Z 60 10 75
3 ID 2022-04-23T03:38:24Z 61 11 72
4 ID 2022-04-23T03:44:20Z 63 13 75
5 ID 2022-04-23T03:45:02Z 64 13 72
6 ID 2022-04-23T03:45:06Z 65 13 69
7 ID 2022-04-23T03:45:08Z 66 14 66
8 ID 2022-04-23T03:45:12Z 67 14 63
9 ID 2022-04-23T03:45:48Z 69 15 61
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论