你应该使用什么数据结构,以便我的“日历”API具有最佳性能?

huangapple go评论88阅读模式
英文:

What data structure should I use so that my "calendar" api has maximum performance?

问题

我有一个微服务,用于提供员工的工作日历。原始数据具有以下文件结构:

calendars / [year] / [employee].json

[employee].json的内容:

{
  // [date]: [hours]
  "2023-02-01": 7,
  "2023-02-02": 7,
}

使用Go语言,我将所有数据转换为两个具有类型CalendarFastCalendarSlow的变量:

type Username = string
type Date = string
type Year = string
type Hours = float64

type CalendarFast map[Username]map[Year][]struct {
	Time  time.Time // "2022-01-01"
	Hours Hours // 7
}

type CalendarSlow map[Username]map[Year]map[Date]Hours

例如,如果我想从我的API返回给定日期范围内的所有小时数,我可以编写以下两个基准测试来测试哪种数据结构更好:

func BenchmarkNew(b *testing.B) {
	c := make(CalendarFast)
	c.Update(context.TODO())

	from, _ := time.Parse("2006-01-02", "2023-01-01")
	to, _ := time.Parse("2006-01-02", "2023-02-01")

	isInRange := func(t, from, to time.Time) bool {
		return (t.After(from) && t.Before(to)) || t.Equal(from) || t.Equal(to)
	}

	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		// result is map[username]map[date]hours
		result := make(map[string]map[string]float64)
		for username, years := range c {
			if _, ok := result[username]; !ok {
				result[username] = make(map[string]float64)
			}

			for _, dates := range years {
				for i, n := 0, len(dates); i < n; i++ {
					if isInRange(dates[i].Time, from, to) {
						result[username][dates[i].Time.String()] = dates[i].Hours
					}
				}
			}
		}
	}
}

func BenchmarkOld(b *testing.B) {
	c := make(CalendarSlow)
	c.Update(context.TODO())

	from, _ := time.Parse("2006-01-02", "2023-01-01")
	to, _ := time.Parse("2006-01-02", "2023-02-01")

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		result := make(map[string]map[string]float64)
		for username, years := range c {
			if _, ok := result[username]; !ok {
				result[username] = make(map[string]float64)
			}

			for _, items := range years {
				for date, hours := range items {
					t, err := time.Parse("2006-01-02", date)
					if err != nil {
						continue
					}

					if (t.After(from) && t.Before(to)) || t.Equal(from) || t.Equal(to) {
						result[username][date] = hours
					}
				}
			}
		}
	}
}

我有以下基准测试结果,显示在数组中搜索范围比在映射中搜索每个元素要快得多:

BenchmarkNew-4  285    4208045 ns/op  692965 B/op  6303 allocs/op
BenchmarkOld-4   39   29672935 ns/op  621877 B/op  1407 allocs/op

我的问题是:我如何进一步提高性能?我应该将数据结构更改为其他结构吗?

我需要速度,因为我当前的PHP API接收到很多调用,并发现Go中的微服务可以轻松处理每秒1,000个请求,只需10m的CPU和64Mib的内存即可在k8s中运行。

我已经对于"time.Time"的数组方法感到满意,但对其他实现方式很好奇。

更新:

测试 迭代次数 时间
我的数组方法 229 5259548 ns/op
erik258 414 2892932 ns/op
英文:

I have a micro service, which serves employees work calendars. The original data has the following file structure:

calendars / [year] / [employee].json

Contents of [employee].json:

{
// [date]: [hours]
&quot;2023-02-01&quot;: 7,
&quot;2023-02-02&quot;: 7,
}

Using Go, I convert all data to these two variables with types CalendarFast and CalendarSlow:

type Username = string
type Date = string
type Year = string
type Hours = float64

type CalendarFast map[Username]map[Year][]struct {
	Time  time.Time // &quot;2022-01-01&quot;
	Hours Hours // 7
}

type CalendarSlow map[Username]map[Year]map[Date]Hours

If i want, for example, from my API to return all hours for given date range I can write these two benchmarks to test which data structure is better:

func BenchmarkNew(b *testing.B) {
	c := make(CalendarFast)
	c.Update(context.TODO())

	from, _ := time.Parse(&quot;2006-01-02&quot;, &quot;2023-01-01&quot;)
	to, _ := time.Parse(&quot;2006-01-02&quot;, &quot;2023-02-01&quot;)

	isInRange := func(t, from, to time.Time) bool {
		return (t.After(from) &amp;&amp; t.Before(to)) || t.Equal(from) || t.Equal(to)
	}

	b.ResetTimer()

	for i := 0; i &lt; b.N; i++ {
		// result is map[username]map[date]hours
		result := make(map[string]map[string]float64)
		for username, years := range c {
			if _, ok := result[username]; !ok {
				result[username] = make(map[string]float64)
			}

			for _, dates := range years {
				for i, n := 0, len(dates); i &lt; n; i++ {
					if isInRange(dates[i].Time, from, to) {
						result[username][dates[i].Time.String()] = dates[i].Hours
					}
				}
			}
		}
	}
}

func BenchmarkOld(b *testing.B) {
	c := make(CalendarSlow)
	c.Update(context.TODO())

	from, _ := time.Parse(&quot;2006-01-02&quot;, &quot;2023-01-01&quot;)
	to, _ := time.Parse(&quot;2006-01-02&quot;, &quot;2023-02-01&quot;)

	b.ResetTimer()
	for i := 0; i &lt; b.N; i++ {
		result := make(map[string]map[string]float64)
		for username, years := range c {
			if _, ok := result[username]; !ok {
				result[username] = make(map[string]float64)
			}

			for _, items := range years {
				for date, hours := range items {
					t, err := time.Parse(&quot;2006-01-02&quot;, date)
					if err != nil {
						continue
					}

					if (t.After(from) &amp;&amp; t.Before(to)) || t.Equal(from) || t.Equal(to) {
						result[username][date] = hours
					}
				}
			}
		}
	}
}

I has following benchmarks, which shows that searching range in array is much faster, that search each element in map:

BenchmarkNew-4  285    4208045 ns/op  692965 B/op  6303 allocs/op
BenchmarkOld-4   39   29672935 ns/op  621877 B/op  1407 allocs/op

My Question is: how can I push performance even further? Should i change data structure to another?

I need speed because i receive a lot of calls to my current PHP API and find out that micro service in Go can easily handle 1k rps with only 10m cpu and 64Mib of memory in k8s

I already satisfied with array for "time.Time" approach, but curious of another implementations.

UPD:

test iterations time
Mine with array 229 5259548 ns/op
erik258 414 2892932 ns/op

答案1

得分: 1

根据您的描述,数据是静态的,并且您只在程序启动时读取一次源文件。

为什么不对每个员工的日期数组进行排序,这样一旦超过结束日期,就可以停止检查剩余的日期呢?

为什么要存在year映射?这只是多余的迭代。

为什么将所有日期存储为字符串,然后每次想要按日期范围进行过滤时都要解析它们呢?为什么不使用time.Time?在内存中更高效,并且无需重复解析。

我会使用map[Username][]struct{date time.Time, hours int},在读取一次后对数组进行排序,然后按以下方式进行迭代:

for username, dates := range c {
   for _, shift := range dates {
      if shift.date.Before(from) {
         continue
      } else if shift.date.After(to) {
         break
      }
      if _, ok := result[username]; !ok {
         result[username] = make(map[string]float64)
      }
      result[username][shift.date.String()] = float64(shift.hours)
   }
}

注意:以上代码是基于Go语言的示例。

英文:

From your description the data is static and you're only reading the source files once at program startup.

Why not order each employee's date array so once you get beyond the end date you can stop checking the rest of the dates?

Why does year map exist?. Just one more thing to iterate over.

And why store all dates as strings, then have to parse them each time you want to filter by date range? Why not use time.Time? Much more efficient in memory and removes the need to parse over and over again.

I would use a map[Username][]struct{date time.Time, hours int}, order the array after I read it once, then iterate more like this:

for username, dates := range c {
for shift := range dates {
if shift.date.Before(from) {
continue
} else if shift.date.After(to) {
break
}
if _, ok := result[username]; !ok {
result[username] = make(map[string]float64)
}
result[username][shift.date] = shift.hours
}
}

huangapple
  • 本文由 发表于 2023年7月21日 20:29:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76738016.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定