如何在时间范围内对数组元素进行归一化处理?

huangapple go评论74阅读模式
英文:

How to normalize elements of an array in a time range?

问题

我正在尝试在一个时间范围内对一个元素数组进行归一化。假设你有20笔银行交易发生在2022年1月1日。

交易 1 - 2022/01/01
交易 2 - 2022/01/01
...
交易 20 - 2022/01/01

我们只知道它们发生的日期,但我们仍然希望为它们分配一天中的小时,使它们变成:

交易 1 - 2022/01/01 00:00
交易 2 - 2022/01/01 ??:??
...
交易 20 - 2022/01/01 23:59

在Go语言中,我有一个函数尝试计算数组中某个索引的一天中的归一化时间:

func normal(start, end time.Time, arraySize, index float64) time.Time {
    delta := end.Sub(start)
    minutes := delta.Minutes()

    duration := minutes * ((index+1) / arraySize)

    return start.Add(time.Duration(duration) * time.Minute)
}

然而,对于在2022年1月1日00:00到2022年1月1日23:59的时间范围内,数组中索引为0的元素,我得到了一个意外的计算结果2022/1/1 05:59,而不是我期望的2022/1/1 00:00。只有在这些条件下,索引为3的元素才能正常工作。

那么,我的归一化计算有什么问题?

编辑:

这是修复后的函数,感谢 @icza 的帮助:

func timeIndex(min, max time.Time, entries, position float64) time.Time {
    delta := max.Sub(min)
    minutes := delta.Minutes()

    if position < 0 {
        position = 0
    }

    duration := (minutes * (position / (entries - 1)))

    return min.Add(time.Duration(duration) * time.Minute)
}

这里有一个例子:假设我们的起始和结束日期是 2022/01/01 00:00 - 2022/01/01 00:03,同时我们的银行交易数组中有3个条目,我们想要获取第3个交易(数组中的索引为2)的归一化时间:

result := timeIndex(time.Date(2022, time.January, 1, 0, 0, 0, 0, time.UTC), time.Date(2022, time.January, 1, 0, 3, 0, 0, time.UTC), 3, 2)

由于起始时间和结束时间之间只有4分钟(从00:0000:03),我们想要找到数组中最后一个条目(索引为2,数组大小为3)的归一化时间,结果应该是:

fmt.Printf("%t", result.Equal(time.Date(2022, time.January, 1, 0, 3, 0, 0, time.UTC)))
// 输出 "true"

或者是范围内的最后一分钟,即 00:03

这里有一个可复现的例子:https://go.dev/play/p/EzwkqaNV1at

英文:

I'm trying to normalize an array of elements in a time range. Say you have 20 bank transactions that occur on Jan 1st, 2022

transaction  1 - 2022/01/01
transaction  2 - 2022/01/01
...
transaction 20 - 2022/01/01

we don't have other data than the day they occurred, but we still want to assign them an hour of the day, so they end as:

transaction  1 - 2022/01/01 00:00
transaction  2 - 2022/01/01 ??:??
...
transaction 20 - 2022/01/01 23:59

In Go I have this function that try to calculate the normalization of a time of day for an index in an array of elements:

func normal(start, end time.Time, arraySize, index float64) time.Time {
    delta := end.Sub(start)
    minutes := delta.Minutes()

    duration := minutes * ((index+1) / arraySize)

    return start.Add(time.Duration(duration) * time.Minute)
}

Howeve, I get an unexpected calculation of 2022/1/1 05:59 for index 0 in an array of 4 elements in a time range of 2022/1/1 00:00 to 2022/1/1 23:59, instead I would expect to see 2022/1/1 00:00. The only that works fine these conditions is index 3.

so, what am I doing wrong with my normalization?

EDIT:

Here is the function fixed thanks to @icza

func timeIndex(min, max time.Time, entries, position float64) time.Time {
	delta := max.Sub(min)
	minutes := delta.Minutes()

	if position &lt; 0 {
		position = 0
	}

	duration := (minutes * (position / (entries - 1)))

	return min.Add(time.Duration(duration) * time.Minute)
}

There is an example: Let's say our start and end date is 2022/01/01 00:00 - 2022/01/01 00:03, also we have 3 entries in our array of bank transactions and that we want to get the normalized time for the transaction nº 3 (2 in the array):

result := timeIndex(time.Date(2022, time.January, 1, 0, 0, 0, 0, time.UTC), time.Date(2022, time.January, 1, 0, 3, 0, 0, time.UTC), 3, 2)

since there is only 4 minutes between the starting and ending times (from 00:00 to 00:03) and want to find the normalized time for the last entry (index 2) in the array (size 3) the result should be:

fmt.Printf(&quot;%t&quot;, result.Equal(time.Date(2022, time.January, 1, 0, 3, 0, 0, time.UTC))
// prints &quot;true&quot;

or the last minute in the range, which is 00:03.

Here is a reproducible example: https://go.dev/play/p/EzwkqaNV1at

答案1

得分: 1

n个点之间有n-1个线段。这意味着如果你想在插值中包括startend,时间段的数量(即delta)为arraySize - 1

另外,如果你将index1,你不可能得到start作为结果(你会跳过00:00)。

因此,正确的算法如下:

func normal(start, end time.Time, arraySize, index float64) time.Time {
    minutes := end.Sub(start).Minutes()

    duration := minutes * (index / (arraySize - 1))

    return start.Add(time.Duration(duration) * time.Minute)
}

Go Playground上试一试。

还要注意,如果你有很多交易(数量接近一天的分钟数,大约是一千),你可能会得到多个具有相同时间戳(相同的小时和分钟)的交易。如果你想避免这种情况,可以使用比分钟更小的精度,例如秒或毫秒:

func normal(start, end time.Time, arraySize, index float64) time.Time {
    sec := end.Sub(start).Seconds()

    duration := sec * (index / (arraySize - 1))

    return start.Add(time.Duration(duration) * time.Second)
}

是的,这将导致秒数不一定为零的时间戳,但会确保更高数量的交易具有不同且唯一的时间戳。

如果你的交易数量接近一天的秒数(86400),那么你可以完全舍弃这个“单位”,直接使用time.Duration本身(即纳秒数)。这将确保即使在最高数量的交易下,时间戳也是唯一的:

func normal(start, end time.Time, arraySize, index float64) time.Time {
    delta := float64(end.Sub(start))

    duration := delta * (index / (arraySize - 1))

    return start.Add(time.Duration(duration))
}

使用100万个交易进行测试,以下是前15个时间部分(它们只在亚秒部分有所不同):

0 - 00:00:00.00000
1 - 00:00:00.08634
2 - 00:00:00.17268
3 - 00:00:00.25902
4 - 00:00:00.34536
5 - 00:00:00.43170
6 - 00:00:00.51804
7 - 00:00:00.60438
8 - 00:00:00.69072
9 - 00:00:00.77706
10 - 00:00:00.86340
11 - 00:00:00.94974
12 - 00:00:01.03608
13 - 00:00:01.12242
14 - 00:00:01.20876
15 - 00:00:01.29510
16 - 00:00:01.38144
17 - 00:00:01.46778
18 - 00:00:01.55412
19 - 00:00:01.64046

Go Playground上试一试。

英文:

Between n points there are n-1 segments. This means if you want to include start and end in the interpolation, the number of time periods (being delta) is arraySize - 1.

Also if you add 1 to the index, you can't possibly have start as the result (you'll skip the 00:00).

So the correct algorithm is this:

func normal(start, end time.Time, arraySize, index float64) time.Time {
	minutes := end.Sub(start).Minutes()

	duration := minutes * (index / (arraySize - 1))

	return start.Add(time.Duration(duration) * time.Minute)
}

Try it on the Go Playground.

Also note that if you have many transactions (in the order of the number of minutes in a day which is around a thousand), you may easily end up having multiple transactions having the same timestamp (same hour and minute). If you want to avoid this, use a smaller precision than minute, e.g. seconds or milliseconds:

func normal(start, end time.Time, arraySize, index float64) time.Time {
	sec := end.Sub(start).Seconds()

	duration := sec * (index / (arraySize - 1))

	return start.Add(time.Duration(duration) * time.Second)
}

Yes, this will result in timestamps where the seconds is also not necessarily zero, but will ensure different, unique timestamps for higher transaction numbers.

If you have transactions in the order of magnitude that is close to the number of seconds in a day (which is 86400), then you can complete drop this "unit" and use time.Duration itself (which is the number of nanoseconds). This will guarantee timestamp uniqueness even for the highest number of transactions:

func normal(start, end time.Time, arraySize, index float64) time.Time {
	delta := float64(end.Sub(start))

	duration := delta * (index / (arraySize - 1))

	return start.Add(time.Duration(duration))
}

Testing this with 1 million transactions, here are the first 15 time parts (they defer only in their sub-second part):

0 - 00:00:00.00000
1 - 00:00:00.08634
2 - 00:00:00.17268
3 - 00:00:00.25902
4 - 00:00:00.34536
5 - 00:00:00.43170
6 - 00:00:00.51804
7 - 00:00:00.60438
8 - 00:00:00.69072
9 - 00:00:00.77706
10 - 00:00:00.86340
11 - 00:00:00.94974
12 - 00:00:01.03608
13 - 00:00:01.12242
14 - 00:00:01.20876
15 - 00:00:01.29510
16 - 00:00:01.38144
17 - 00:00:01.46778
18 - 00:00:01.55412
19 - 00:00:01.64046

Try this one on the Go Playground.

huangapple
  • 本文由 发表于 2022年11月21日 14:27:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/74515004.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定