2023年7月21日 20:29:08go评论120阅读模式

英文:

What data structure should I use so that my "calendar" api has maximum performance?

问题

我有一个微服务，用于提供员工的工作日历。原始数据具有以下文件结构：

calendars / [year] / [employee].json

[employee].json的内容：

{
  // [date]: [hours]
  "2023-02-01": 7,
  "2023-02-02": 7,
}

使用Go语言，我将所有数据转换为两个具有类型CalendarFast和CalendarSlow的变量：

type Username = string
type Date = string
type Year = string
type Hours = float64
type CalendarFast map[Username]map[Year][]struct {
	Time  time.Time // "2022-01-01"
	Hours Hours // 7
}
type CalendarSlow map[Username]map[Year]map[Date]Hours

例如，如果我想从我的API返回给定日期范围内的所有小时数，我可以编写以下两个基准测试来测试哪种数据结构更好：

func BenchmarkNew(b *testing.B) {
	c := make(CalendarFast)
	c.Update(context.TODO())
	from, _ := time.Parse("2006-01-02", "2023-01-01")
	to, _ := time.Parse("2006-01-02", "2023-02-01")
	isInRange := func(t, from, to time.Time) bool {
		return (t.After(from) && t.Before(to)) || t.Equal(from) || t.Equal(to)
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		// result is map[username]map[date]hours
		result := make(map[string]map[string]float64)
		for username, years := range c {
			if _, ok := result[username]; !ok {
				result[username] = make(map[string]float64)
			}
			for _, dates := range years {
				for i, n := 0, len(dates); i < n; i++ {
					if isInRange(dates[i].Time, from, to) {
						result[username][dates[i].Time.String()] = dates[i].Hours
					}
				}
			}
		}
	}
}
func BenchmarkOld(b *testing.B) {
	c := make(CalendarSlow)
	c.Update(context.TODO())
	from, _ := time.Parse("2006-01-02", "2023-01-01")
	to, _ := time.Parse("2006-01-02", "2023-02-01")
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		result := make(map[string]map[string]float64)
		for username, years := range c {
			if _, ok := result[username]; !ok {
				result[username] = make(map[string]float64)
			}
			for _, items := range years {
				for date, hours := range items {
					t, err := time.Parse("2006-01-02", date)
					if err != nil {
						continue
					}
					if (t.After(from) && t.Before(to)) || t.Equal(from) || t.Equal(to) {
						result[username][date] = hours
					}
				}
			}
		}
	}
}

我有以下基准测试结果，显示在数组中搜索范围比在映射中搜索每个元素要快得多：

BenchmarkNew-4  285    4208045 ns/op  692965 B/op  6303 allocs/op
BenchmarkOld-4   39   29672935 ns/op  621877 B/op  1407 allocs/op

我的问题是：我如何进一步提高性能？我应该将数据结构更改为其他结构吗？

我需要速度，因为我当前的PHP API接收到很多调用，并发现Go中的微服务可以轻松处理每秒1,000个请求，只需10m的CPU和64Mib的内存即可在k8s中运行。

我已经对于"time.Time"的数组方法感到满意，但对其他实现方式很好奇。

更新：

测试	迭代次数	时间
我的数组方法	229	5259548 ns/op
erik258	414	2892932 ns/op

英文:

I have a micro service, which serves employees work calendars. The original data has the following file structure:

calendars / [year] / [employee].json

Contents of [employee].json:

{
// [date]: [hours]
&quot;2023-02-01&quot;: 7,
&quot;2023-02-02&quot;: 7,
}

Using Go, I convert all data to these two variables with types CalendarFast and CalendarSlow:

type Username = string
type Date = string
type Year = string
type Hours = float64
type CalendarFast map[Username]map[Year][]struct {
	Time  time.Time // &quot;2022-01-01&quot;
	Hours Hours // 7
}
type CalendarSlow map[Username]map[Year]map[Date]Hours

If i want, for example, from my API to return all hours for given date range I can write these two benchmarks to test which data structure is better:

func BenchmarkNew(b *testing.B) {
	c := make(CalendarFast)
	c.Update(context.TODO())
	from, _ := time.Parse(&quot;2006-01-02&quot;, &quot;2023-01-01&quot;)
	to, _ := time.Parse(&quot;2006-01-02&quot;, &quot;2023-02-01&quot;)
	isInRange := func(t, from, to time.Time) bool {
		return (t.After(from) &amp;&amp; t.Before(to)) || t.Equal(from) || t.Equal(to)
	}
	b.ResetTimer()
	for i := 0; i &lt; b.N; i++ {
		// result is map[username]map[date]hours
		result := make(map[string]map[string]float64)
		for username, years := range c {
			if _, ok := result[username]; !ok {
				result[username] = make(map[string]float64)
			}
			for _, dates := range years {
				for i, n := 0, len(dates); i &lt; n; i++ {
					if isInRange(dates[i].Time, from, to) {
						result[username][dates[i].Time.String()] = dates[i].Hours
					}
				}
			}
		}
	}
}
func BenchmarkOld(b *testing.B) {
	c := make(CalendarSlow)
	c.Update(context.TODO())
	from, _ := time.Parse(&quot;2006-01-02&quot;, &quot;2023-01-01&quot;)
	to, _ := time.Parse(&quot;2006-01-02&quot;, &quot;2023-02-01&quot;)
	b.ResetTimer()
	for i := 0; i &lt; b.N; i++ {
		result := make(map[string]map[string]float64)
		for username, years := range c {
			if _, ok := result[username]; !ok {
				result[username] = make(map[string]float64)
			}
			for _, items := range years {
				for date, hours := range items {
					t, err := time.Parse(&quot;2006-01-02&quot;, date)
					if err != nil {
						continue
					}
					if (t.After(from) &amp;&amp; t.Before(to)) || t.Equal(from) || t.Equal(to) {
						result[username][date] = hours
					}
				}
			}
		}
	}
}

I has following benchmarks, which shows that searching range in array is much faster, that search each element in map:

BenchmarkNew-4  285    4208045 ns/op  692965 B/op  6303 allocs/op
BenchmarkOld-4   39   29672935 ns/op  621877 B/op  1407 allocs/op

My Question is: how can I push performance even further? Should i change data structure to another?

I need speed because i receive a lot of calls to my current PHP API and find out that micro service in Go can easily handle 1k rps with only 10m cpu and 64Mib of memory in k8s

I already satisfied with array for "time.Time" approach, but curious of another implementations.

UPD:

test	iterations	time
Mine with array	229	5259548 ns/op
erik258	414	2892932 ns/op

答案1

得分: 1

根据您的描述，数据是静态的，并且您只在程序启动时读取一次源文件。

为什么不对每个员工的日期数组进行排序，这样一旦超过结束日期，就可以停止检查剩余的日期呢？

为什么要存在year映射？这只是多余的迭代。

为什么将所有日期存储为字符串，然后每次想要按日期范围进行过滤时都要解析它们呢？为什么不使用time.Time？在内存中更高效，并且无需重复解析。

我会使用map[Username][]struct{date time.Time, hours int}，在读取一次后对数组进行排序，然后按以下方式进行迭代：

for username, dates := range c {
   for _, shift := range dates {
      if shift.date.Before(from) {
         continue
      } else if shift.date.After(to) {
         break
      }
      if _, ok := result[username]; !ok {
         result[username] = make(map[string]float64)
      }
      result[username][shift.date.String()] = float64(shift.hours)
   }
}

注意：以上代码是基于Go语言的示例。

英文:

From your description the data is static and you're only reading the source files once at program startup.

Why not order each employee's date array so once you get beyond the end date you can stop checking the rest of the dates?

Why does year map exist?. Just one more thing to iterate over.

And why store all dates as strings, then have to parse them each time you want to filter by date range? Why not use time.Time? Much more efficient in memory and removes the need to parse over and over again.

I would use a map[Username][]struct{date time.Time, hours int}, order the array after I read it once, then iterate more like this:

for username, dates := range c {
for shift := range dates {
if shift.date.Before(from) {
continue
} else if shift.date.After(to) {
break
}
if _, ok := result[username]; !ok {
result[username] = make(map[string]float64)
}
result[username][shift.date] = shift.hours
}
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你应该使用什么数据结构，以便我的“日历”API具有最佳性能？

问题

答案1

如何将具有相同结构的数据从一种类型复制到另一种类型？

os.MkdirAll()在创建嵌套目录时失败。

排序链表

从/在VSCODE中删除无效字符

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。