  • I currently have a UNIX timestamp as a 64-bit float. It's seconds with a few fractions as a float. Such as 1687976937.597064.
  • I need to convert it to nanoseconds. But there's 1 billion nanoseconds in 1 second. And doing a straight multiplication by 1 billion <s>would overflow the 64-bit float.</s>

Let's first consider the limits:

  • 1_687_976_937_597_064_000 is the integer result of the above timestamp multiplied by 1 billion. The goal is figuring out a way to safely reach this number.
  • 9_223_372_036_854_775_807 is the maximum number storable in a 64-bit signed integer.
  • 9_007_199_254_740_992.0 is the maximum number storable in a 64-bit float. And at that scale, there aren't enough bits to store any decimals at all (it's permanently .0). Edit: This claim is not correct. See the end of this post...
  • So 64-bit signed integer can hold the result. But a 64-bit float cannot hold the result and would overflow.

So I was thinking:

  • Since an integer is able to easily represent the result, I thought I could first convert the integer portion to an integer, and multiply by 1 billion.
  • And then extract just the decimals so that I get a new 0.XXXXXX float, and then multiply that by 1 billion. By leading with a zero, I ensure that the integer portion of the float will never overflow. But perhaps the decimals could still overflow somehow? Hopefully floats will just safely truncate the trailing decimals instead of overflowing. By multiplying a 0.X number by 1 billion, the resulting value should never be able to be higher than 1_999_999_999.XXXXX so it seems like this multiplication should be safe...
  • After that, I truncate the "decimals float" into an integer to ensure that the result will be an integer.
  • Lastly, I add together the two integers.

It seems to work, but this technique looks so hacky. Is it safe?

Here's a Python repl showing the process:

&gt;&gt;&gt; num = 1687976937.597064
&gt;&gt;&gt; whole = int(num)
&gt;&gt;&gt; whole
&gt;&gt;&gt; decimals = num - whole
&gt;&gt;&gt; decimals
&gt;&gt;&gt; (whole * 1_000_000_000)
&gt;&gt;&gt; (decimals * 1_000_000_000)
&gt;&gt;&gt; int(decimals * 1_000_000_000)
&gt;&gt;&gt; (whole * 1_000_000_000) + int(decimals * 1_000_000_000)
&gt;&gt;&gt; type((whole * 1_000_000_000) + int(decimals * 1_000_000_000))
&lt;class &#39;int&#39;&gt;

So here's the comparison:

  • 1_687_976_937_597_064_018 was the result of the above algorithm. And yes, there's a slight, insignificant float rounding error but I don't mind.
  • 1_687_976_937_597_064_000 is the scientifically correct answer given by Wolfram Alpha's calculator.

It certainly looks like a success, but is there any risk that my algorithm would be dangerous and break?

I am not brave enough to put it into production without confirmation that it's safe.

Concerning the 64-bit float limits: Here are the results in Python 3's repl (pay attention to the 993 input and the 992 in the output):

&gt;&gt;&gt; 9_007_199_254_740_993.0

But perhaps I am reading that "limit" incorrectly... Perhaps this is just a float rounding error.


得分: 1



你提到需要使用C POSIX utimensat。它需要一个包含两个成员的timespec结构,用于保存秒数和纳秒数。








The thing when working with 64bit floats is that the number of "trusted" digits is 15. Perhaps the 16th digit is also right, but you can't know.

So when you take (or print) the decimal part of a float anything beyond the 16th digit is garbage.

You say you need C POSIX utimensat. It requires an array of timespec structures, which has two members to hold the number of seconds and the number of nanoseconds.

The first member (seconds) allows any integer type, likely "long". But the type used for nanoseconds ("long") is platform-dependant: It can be a 32 or a 64 signed integer.
<br> So, available range must be watched.

You can see the limits of the types in the C docs.

Fortunately a 32 bit long can hold 9 digits (and an extra 0,1,2 at the beginning). This is enough to print the nanoseconds part (exactly 9 digits).

So far, so good. You can use 32bit integers, no 64bit needed.

The key now is how to split the float into two integers.
<br>One way is transforming the number into a string, and add required zeros for the decimal part. Then take the parts before and after the decimal point, translate them into integers and you are done.

Without strings, the whole part is easy. A simple int(number).
<br>For the decimal part you can substract num-whole. Then multiply by 10^9, convert into a integer and truncate at the right position. This position is exactly after the 9th digit.


得分: 1

关于文件,Unix [时间戳][1] 的准确性取决于[操作系统底层时间戳粒度][2]。这个粒度可以是2秒、1秒、毫秒、微秒和纳秒。如果你将精度高于操作系统粒度(如果有的话)的数字转换成纳秒表示,你将得到一个不正确的纳秒表示。




def ts2its(ts):
    sec, _, dec = f'{ts:.3f}'.partition('.')
    return int(sec) * 1_000_000_000 + int(dec) * 1_000_000

>>> ts2its(1687976937.597064)

另外,对于纯数学方法,你可以使用[math.modf][3] 来删除毫秒后的小数位:

def ts2its2(ts):
    # 在所有区域和平台上都可靠
    sec1, dec = math.modf(ts)
    _, sec = math.modf(sec1 * 1e3)
    return int(dec * 1e9) + int(sec * 1e6)

>>> ts2its2(1687976937.597064)


>>> ts = 1687976937.597064
>>> int(ts * 1_000_000_000)



SSSSSSSSSS          1秒精度下的有效位数
SSSSSSSSSS SSS      毫秒精度下的有效位数
SSSSSSSSSS SSSSS?   第16位数字的准确性有疑问


现在看看int(ts * 1_000_000_000)的结果:





1. 任何时间戳都可以表示为毫秒精度;
2. 大多数时间戳都可以表示为微秒精度;
3. 在微秒和纳秒时间戳之间,逐位准确性丢失。


[1]: https://en.wikipedia.org/wiki/Unix_time
[2]: https://docs.python.org/3/library/os.html#os.stat_result.st_ctime_ns
[3]: https://docs.python.org/3/library/math.html#math.modf
[4]: https://en.wikipedia.org/wiki/Unix_time#Representing_the_number


For files, Unix [time stamps][1] are only as accurate as the [OS underlying time stamp granularity][2]. This granularity can be 2 seconds, 1 second, milliseconds, microseconds and nanoseconds. If you convert the digits beyond the granularity of the OS (if any) you will get an incorrect nanosecond representation. 

Further, a 64 bit float can accurately represent integers in the range of 2^53. Potentially, a nanosecond value could be 64 bits so not all values can be represented. The usual assumption is a decimal representation of a 64 bit float is as accurate as an integer for 15 decimal digits (it can be more, but 15 decimal digits will hold all combinations of digits no matter if it a factor of 2.) 

Two ways to get around these issues. Assume milliseconds file stamps, or `1e-3`. 

You can use strings to get a rounded value as you expect it to be:

    def ts2its(ts):
        sec,_,dec=f&#39;{ts:.3f}&#39;.partition(&#39;.&#39;) # use locale to get the decimal...
        return int(sec)*1_000_000_000+int(dec)*1_000_000

    &gt;&gt;&gt; ts2its(1687976937.597064)

Alternatively, for a purely math approach, you can use [math.modf][3] in the following manner to drop the digits after milliseconds:

    def ts2its2(ts):
        # reliable in all locales and platforms
        sec1, dec = math.modf(ts)
        _,sec = math.modf(sec1*1e3)
        return int(dec*1e9)+int(sec*1e6)

    &gt;&gt;&gt; ts2its2(1687976937.597064)


You propose this in your own answer as the solution:

    &gt;&gt;&gt; ts=1687976937.597064
    &gt;&gt;&gt; int(ts*1_000_000_000)

Granted, 99% of the time this is a fine solution. But let&#39;s talk about the 1% of the time when it is not.

Let&#39;s start by looking at your example time:

     SSSSSSSSSS          Significant under 1 second granularity
     SSSSSSSSSS SSS      Significant under millisecond granularity
     SSSSSSSSSS SSSSS?   The 16th digit is questionable as accurate

     S=significant under 

Now look at the result of `int(ts*1_000_000_000)`:


     ?=Questionably significant
     X=Likely inaccurate.

So the issue is really the inverse of your problem statement. 

With a 64 bit float:

 1. ANY time stamp can be represented to millisecond accuracy;
 1. MOST time stamps can be represented to microsecond accuracy;
 1. Digit by digit accuracy is lost between microsecond and nanosecond time stamps.

A 64 bit float time stamp is really just a convenience data structure for a time stamp. The [actual representation][4] of nanosecond time stamps are OS dependent but usually is a compound data structure with two ints; one for the seconds and another for the fraction. 

  [1]: https://en.wikipedia.org/wiki/Unix_time
  [2]: https://docs.python.org/3/library/os.html#os.stat_result.st_ctime_ns
  [3]: https://docs.python.org/3/library/math.html#math.modf
  [4]: https://en.wikipedia.org/wiki/Unix_time#Representing_the_number


# 答案3
**得分**: 0


>>> 9_000_000_000_000_000_000_000_000_000.0 * 1_000_000_000
>>> 9_123_456_789_012_345_678_901_234_567.0 * 1_000_000_000
>>> int(9_123_456_789_012_345_678_901_234_567.0 * 1_000_000_000)




答案是我可以直接进行乘法运算,这是完全安全的。由于我的乘数是纯粹的10亿,没有任何分数,它只会将指数扩大10亿倍,而不会改变任何数字。太棒了。 安全的将秒转换为纳秒的方法,避免整数/浮点数溢出?


>>> int(1687976937.597064 * 1_000_000_000)

尽管在上面使用整数时,Python实际上会将其内部转换为浮点数(1_000_000_000 (int) -> 1e9 (float)),因为另一个操作数是浮点数。

因此,直接使用浮点数进行该乘法运算实际上快6%,因为它避免了对乘数进行int -> float转换:

>>> int(1687976937.597064 * 1e9)

正如您所看到的,结果是相同的,因为两种情况都是进行float * float数学运算。整数只是首先需要额外的转换步骤,而后一种方法则避免了这种情况。


  • 1_687_976_937_597_064_018 是我之前“分割”算法的结果(在我最初的问题中)。
  • 1_687_976_937_597_063_936 是“只信任浮点数并直接进行乘法运算”的建议给出的结果。
  • 1_687_976_937_597_064_000 是Wolfram Alpha计算器给出的数学正确答案。


但这些是以纳秒表示的UNIX时间戳,实际上没有人真正关心“秒的小数部分”的精度。重要的是分数的前几位重要数字,而这些都是正确的。这才是最重要的。最终,我将使用此结果通过utimensat API设置磁盘上的时间戳,真正重要的是我能够获得大致正确的秒的小数部分。 安全的将秒转换为纳秒的方法,避免整数/浮点数溢出?

我使用Python的os.utime()包装器来调用该API,它将纳秒作为有符号整数:如果指定了ns,它必须是一个形如(atime_ns, mtime_ns)的2元组,其中每个成员都是表示纳秒的int。



# 对于几乎所有人来说,精度足够高。速度很快。
file_meta = target_path.lstat()
st_mtime_ns = int(self.unix_mtime * 1e9)
    target_path, ns=(file_meta.st_atime_ns, st_mtime_ns), follow_symlinks=False



Floats can have a really large exponent without losing their significant precision. Turns out that floats allow really large multiplication without any issues, as such:

&gt;&gt;&gt; 9_000_000_000_000_000_000_000_000_000.0 * 1_000_000_000
&gt;&gt;&gt; 9_123_456_789_012_345_678_901_234_567.0 * 1_000_000_000
&gt;&gt;&gt; int(9_123_456_789_012_345_678_901_234_567.0 * 1_000_000_000)

So basically, the float is keeping as many "significant digits" as it can fit internally, truncating the rest (in the left hand operator in the examples above), and then just scaling the exponent. It's able to roughly represent Unix nanosecond timestamps that are far larger than the age of the universe.

When it's time to convert it to an integer, you can also see that the float keeps as much precision as it could and does a good job with the conversion. All of the significant digits are there. There's a lot of "random float rounding errors/noise" at the end of the output number, but those digits don't matter.

In other words, I've had a fundamental misunderstanding about the size of numbers that a float can store. It's not limited per se. It just stores a fixed amount of significant digits and then it uses an exponent to reach the desired scale. So a float would suffice here!

The answer is that I can just do the multiplication directly, and it will be totally safe. Since my multiplier is a straight 1 billion without any fractions, it will just scale up the exponent by 1 billion, without changing any of the digits at all. Fantastic. 安全的将秒转换为纳秒的方法,避免整数/浮点数溢出?

Just like this!

&gt;&gt;&gt; int(1687976937.597064 * 1_000_000_000)

Although when we use an integer like above, Python actually internally converts it into a float (1_000_000_000 (int) -&gt; 1e9 (float)), since the other operand is a float.

So it's actually 6% faster to do that multiplication with a float directly (avoiding the need for int -&gt; float conversion of the multiplier):

&gt;&gt;&gt; int(1687976937.597064 * 1e9)

As you can see, the result is identical, since both cases are doing float * float math. The integer just required an extra conversion step first, which the latter method avoids.

Let's recap:

  • 1_687_976_937_597_064_018 was the result of my "split" algorithm earlier (in my original question).
  • 1_687_976_937_597_063_936 is the result given by the suggestion to "just trust the float and do the multiply directly".
  • 1_687_976_937_597_064_000 is the mathematically correct answer given by Wolfram Alpha's calculator.

So my "split" technique had a smaller rounding error. The reason why my method was more accurate is because I had "split" my number into "whole" (int) and "decimals/fractions" (float). Which means that my method has full devotion of all significant digits to the decimals, since I had removed "the whole number" before the decimals/fractions. This means that my "decimals" float was able to devote all significant digits to properly representing the decimals with much greater precision.

But these are UNIX timestamps represented as nanoseconds, and nobody really cares about the "fractions of a second" precision that much. What matters are the first few, important digits of the fraction, and those are all correct. That's all that matters in the end. I'll be using this result to set timestamps on disk via the utimensat API, and all that really matters is that I get roughly the correct fractions of a second. 安全的将秒转换为纳秒的方法,避免整数/浮点数溢出?

I use the Python os.utime() wrapper for that API, which takes the nanoseconds as a signed integer: "If ns is specified, it must be a 2-tuple of the form (atime_ns, mtime_ns) where each member is an int expressing nanoseconds."

I'm going to do the straight multiplication and then convert the result to an int. That does the math in one simple step, gets sufficient precision for the decimals (fractions of a second), and solves the issue in a satisfactory way!

Here's the Python code I'll be using. It preserves the current "access time" as nanoseconds by fetching that value from disk, and takes the self.unix_mtime float (a UNIX timestamp with fractions of a second as decimals) and converts that to a signed 64-bit integer nanosecond representation, and then applies the change to the target file/directory:

# Good enough precision for practically anybody. Fast.
file_meta = target_path.lstat()
st_mtime_ns = int(self.unix_mtime * 1e9)
    target_path, ns=(file_meta.st_atime_ns, st_mtime_ns), follow_symlinks=False

If anyone else wants to do this, beware that I am using lstat() to get the status of symlinks rather than their target, and using follow_symlinks=False to ensure that if the final target_path component is a symlink then I affect the link itself rather than the target. Other people may want to change these calls to stat() and follow_symlinks=True if you prefer affecting the target rather than the symlink itself. But I would guess that most people prefer my method of affecting the symlink itself if the target_path points at a symlink.

If you care about doing this "seconds-float to nanoseconds int" conversion with the highest achievable precision (by devoting maximum float precision to all the decimal digits to minimize rounding errors), then you can do my "split" variant as follows instead (I added type hints for clarity):

# Great conversion precision. Slower.
file_meta = target_path.lstat()
whole: int = int(self.unix_mtime)
frac: float = self.unix_mtime - whole
st_mtime_ns: int = whole * 1_000_000_000 + int(frac * 1e9)
    target_path, ns=(file_meta.st_atime_ns, st_mtime_ns), follow_symlinks=False

As you can see, it uses int * int math for the "whole seconds" and uses float * float math for the "fractions of a second". And then combines the result into an integer. This gives the best of both worlds in terms of accuracy and speed.

I did some benchmarks:

  • 50 million iterations on a Ryzen 3900x CPU.
  • The "simplified, less accurate" version took 11.728529000014532 seconds.
  • The more accurate version took 26.941824199981056 seconds. That's 2.3x the time.
  • Considering that I did 50 million iterations, you can be sure that you can safely use the more accurate version without having to worry about the performance. So if you want more accurate timestamps, feel free to use the last method. 安全的将秒转换为纳秒的方法,避免整数/浮点数溢出?
  • As a bonus, I benchmarked @dawg's answer, which is the exact same idea as "the more accurate method", but is done via two calls to math.modf() instead of directly calculating the whole/fraction manually. Their answer is the slowest at 33.54755139999557 seconds. I wouldn't recommend it. Besides, the primary idea behind their technique was just to discard everything after the first three float decimals, which doesn't even matter for any practical purposes, and if their removal is truly desired then it can be achieved without slow math.modf() calls by simply changing my "more accurate" variant's final line to say whole * 1_000_000_000 + (int(frac * 1e3) * 1_000_000) instead, which achieves that decimal truncation technique in 27.95227960000746 seconds instead.

There's also a third method via the discussed decimal library which would have perfect mathematical precision (it doesn't use floats), but it's very slow, so I didn't include it.

