2023年7月6日 18:57:02go评论102阅读模式

英文:

pandas: loss of precision when specifying seconds as floating point

问题

我需要创建一个包含5000个元素的日期时间索引，具有未知的偏移量和元素之间的未知增量。增量值和偏移量是参数，唯一确定的是它们将以整数或浮点数的形式表示为秒。

我使用 pd.Timedelta(value, "s") 来计算这个增量（因为 np.timedelta64() 不接受浮点数值）。

pd.to_datetime(1687957943.122, unit="s") + np.arange(0, 5000) * pd.Timedelta(0.002, "s")

不幸的是，浮点数运算会导致精度损失（以下数字不是精确相差0.002秒）：

array(['2023-06-28T13:12:23.121999872', '2023-06-28T13:12:23.123999872',
'2023-06-28T13:12:23.125999872', ...,
'2023-06-28T13:12:33.115999872', '2023-06-28T13:12:33.117999872',
'2023-06-28T13:12:33.119999872'], dtype='datetime64[ns]')

比较一下：

# 偏移量手动升级为整数并指定单位为毫秒
pd.to_datetime(1687957943122, unit="ms") + np.arange(0, 5000) * pd.Timedelta(0.002, "s")

这会得到我想要的结果：

array(['2023-06-28T13:12:23.122000000', '2023-06-28T13:12:23.124000000',
'2023-06-28T13:12:23.126000000', ...,
'2023-06-28T13:12:33.116000000', '2023-06-28T13:12:33.118000000',
'2023-06-28T13:12:33.120000000'], dtype='datetime64[ns]')

然而，由于我不知道偏移量的时间精度，我不能简单地这样做。

我可能可以编写一些代码来确定正确的单位，但感觉这应该已经是一些内置功能了。如果我不需要 pandas 的话是否有任何提示？ +1 如果不需要 pandas。

英文:

I need to create an datetime index with 5000 elements, an unknown offset and an unknown delta between the elements. The delta value and the offset are parameters and the only certainty is that they will be expressed in seconds as an integer or floating-point number.

I use pd.Timedelta(value, "s") to compute this delta (since np.timedelta64() does not accept floating-point values).

pd.to_datetime(1687957943.122, unit=&quot;s&quot;) + np.arange(0, 5000) * pd.Timedelta(0.002, &quot;s&quot;)

unfortunately, the floating-point arithmetic causes loss of precision (the following numbers aren't exactly 0.002 seconds apart):

> array(['2023-06-28T13:12:23.121999872', '2023-06-28T13:12:23.123999872',
'2023-06-28T13:12:23.125999872', ...,
'2023-06-28T13:12:33.115999872', '2023-06-28T13:12:33.117999872',
'2023-06-28T13:12:33.119999872'], dtype='datetime64[ns]')

compare:

# offset manually upgraded to integer number and unit specified as ms
pd.to_datetime(1687957943122, unit=&quot;ms&quot;) + np.arange(0, 5000) * pd.Timedelta(0.002, &quot;s&quot;)

this gets me the desired result:

> array(['2023-06-28T13:12:23.122000000', '2023-06-28T13:12:23.124000000',
'2023-06-28T13:12:23.126000000', ...,
'2023-06-28T13:12:33.116000000', '2023-06-28T13:12:33.118000000',
'2023-06-28T13:12:33.120000000'], dtype='datetime64[ns]')

However, since I don't know the time precision of the offset, I cannot simply do this.

I could probably write some code to determine the correct unit, but it feels like this shoudl be some built-in functionality already. Any clues? +1 if I don't need pandas at all.

答案1

得分: 1

Sure, here are the translated parts:

问题已经从以下开始：

g = pd.to_datetime(1687957943.122, unit="s")
g.microsecond  # == 121999

为了避免这种行为，您需要使用pd.Timestamp.fromtimestamp()函数：

g = pd.Timestamp.fromtimestamp(1687957943.122)
g.microsecond   # == 122000

至于不使用 pandas 的解决方案：

g = datetime.fromtimestamp(1687957943.122)
g = pd.to_datetime(g)
g.microsecond  # == 122000

我确实想知道背后是如何实现的，但这回答了主要问题。

英文:

So the problem already starts at:

g = pd.to_datetime(1687957943.122, unit=&quot;s&quot;)
g.microsecond  # == 121999

You need to use the pd.Timestamp.fromtimestamp() function to avoid such behaviour:

g = pd.Timestamp.fromtimestamp(1687957943.122)
g.microsecond   # == 122000

As for the solution without pandas:

g = datetime.fromtimestamp(1687957943.122)
g = pd.to_datetime(g)
g.microsecond  # == 122000

I do wonder how it is done behind the scenes, but this answers the main question.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas：在指定秒数为浮点数时丢失精度

问题

答案1

将1个输入句子与1个给定句子通过相似性映射。

在熊猫（Panda）的流动性分析中性能提升。

Async.io的`as_completed`返回ClientResponse协程而不是包含内容的实际响应。

如何在更新属性时验证对象仍然符合这些条件？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。