英文:
Suggested Golang architecture for polling user accounts frequently
问题
我正在创建一个小型服务,在一个类似Twitter的服务中频繁轮询大约100个账户(大约每5秒一次),以检查是否有新消息,因为该服务尚未提供类似Twitter的流式API。
在我的脑海中,我计划的架构是为每个用户每5秒排队一个Ticker。一旦触发了tick,我就会向服务发出API调用,检查他们的消息,并调用SELECT
查询我的Postgres数据库以获取特定用户的详细信息,并检查最新消息的日期,如果有比那个日期更新的消息,就会UPDATE
该条目并通知用户。如此循环。
我对后端事务和架构不是很有经验,所以我想确保这不是一个完全荒谬的设置。对数据库的调用次数合理吗?我是否滥用了goroutines?
英文:
I'm creating a small service where I poll around 100 accounts (in a Twitter-like service) frequently (every 5 seconds or so) to check for new messages, as the service doesn't yet provide a streaming API (like Twitter actually does).
In my head, I have the architecture planned as queuing Ticker
s every 5 seconds for every user. Once the tick fires I make an API call to the service, check their messages, and call SELECT
to my Postgres database to get the specific user details and check the date of the most recent message, and if there are messages newer than that UPDATE
the entry and notify the user. Repeat ad nauseum.
I'm not very experienced in backend things and architecture, so I want to make sure this isn't an absolutely absurd setup. Is the amount of calls to the database sensible? Am I abusing goroutines?
答案1
得分: 1
让我根据你的描述来回答。
> 我想确保这不是一个非常荒谬的设置。
我理解如下。对于每个用户,你在一个goroutine中每5秒创建一个tick。另一个goroutine消费这些tick,执行轮询并将最后一条消息的日期与你在PostgreSQL数据库中记录的日期进行比较。
答案是:这要看情况。你有多少用户,你的应用程序能支持多少用户?根据我的经验,回答这个问题的最佳方法是测量你的应用程序的性能。
> 对数据库的调用次数是否合理?
这要看情况。为了给你一些保证,我见过一个单独的PostgreSQL数据库每秒处理数百个SELECT
查询。我没有看到设计上的错误,所以对你的应用程序进行基准测试是正确的方法。
> 我是否滥用了goroutine?
你是指执行太多goroutine吗?我认为你不太可能以这种方式滥用goroutine。 如果你有特定的原因认为这可能是问题所在,提供相应的代码片段可以使你的问题更加明确。
英文:
Let me answer given what you describe.
> I want to make sure this isn't an absolutely absurd setup.
I understand the following. For each user, you create a tick every 5 seconds in one goroutine. Another goroutine consumes those ticks, performing the polling and comparing the date of the last message with the date you have recorded in your PostgreSQL database.
The answer is: it depends. How many users do you have and how many can your application support? In my experience the best way to answer this question is to measure performance of your application.
> Is the amount of calls to the database sensible?
It depends. To give you some reassurance, I have seen a single PostgreSQL database take hundreds of SELECT
per second. I don't see a design mistake, so benchmarking your application is the way to go.
> I am abusing goroutines?
Do you mean like executing too many of them? I think it is unlikely that you are abusing goroutines that way. If there is a particular reason you think this could be the case, posting the corresponding code snippet could make your question more precise.
答案2
得分: 1
- 你的架构是最高效的方式吗?不是。
- 现在应该采取一些行动吗?不是,你应该测试你的解决方案。
你可以通过一些众所周知的优化方法来进一步提高性能,比如切换到响应式模型、添加一些缓存服务器、将负载分散到多个数据库从节点上等等。
你应该在规模上测试你的解决方案,如果它在用户吞吐量和服务器成本方面符合你的需求,那么你的解决方案就是正确的。
英文:
- Is your architecture the most efficient way to go ? No.
- Should you do something about it now ? No, you should test your solution.
You can always go deeper with optimisations, in your case you need client throughput so you can use a bunch of well known optimisations like switching to a reactive model, add some cache server, spread the load on multiple DB slaves, ...
You should test your solution at scale, if it fits your needs in term of user throughput and server cost, then your solution is the right one.
答案3
得分: 1
你提出的解决方案是:每5秒钟为每个用户发起1个查询。假设有100个用户,那么查询的频率是:
1 * 100 / 5 秒 = 20个查询 / 秒
如果查询速度很快,这个负载并不算大。
但是为什么你需要为每个用户单独执行这个操作呢?如果你只需要以5秒为粒度获取更新,你可以每5秒钟执行1个查询,不需要按用户进行筛选,而是检查所有用户的更新。
如果上述查询返回结果,你可以遍历结果,并对在过去5秒内有更新的每个用户执行必要的操作。这样的话:
1个查询 / 5秒 = 0.2个查询 / 秒
这样查询的数量减少了100倍,但仍然可以在相同的时间粒度内获取所有的更新。
如果针对更新执行的任务很长或者依赖于外部系统(例如调用另一个服务器),你可以在单独的goroutine中执行这些任务。你可以选择为每个任务启动一个新的goroutine,或者使用一个工作goroutine池来消费这些排队的任务,并使用通道将任务加入队列。
英文:
Your proposed solution: 1 query in every 5 seconds for every user. Having 100 users this is:
1 * 100 / 5 seconds = 20 queries / second
This is not considered a big load if the queries are fast.
But why do you need to do this for every users separately? If you need to pick up updates in the granularity of 5 seconds, you could just execute 1 query in every 5 seconds which does not filter by user but checks for updates from all the users.
If the above query gives results, you can iterate over the results and do the necessary for each user that had updates in the last 5 seconds. This results in:
1 query / 5 seconds = 0.2 query / second
Which is a hundred times less queries, still getting you all the updates in the same time granularity.
If the task to be performed for the updates is long or depends on external systems (e.g. a call to another server), you may perform those tasks in separate goroutines. You may choose to either launch a new goroutine for each task, or you may have a pool of worker goroutines which consume these queued tasks, and just queue the task (using channels).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论