2017年9月15日 13:55:58go评论107阅读模式

英文:

Stuck with understanding how to build a scalable system

问题

我需要一些关于如何构建一个能够扩展的系统的指导。我会给你一些关于我想做的事情的信息，然后问我的具体问题。

我有一个网站，我希望访问者可以发送一些数据进行处理。他们可以将数据输入到文本区域或者上传到文件中。很简单。在将数据作为POST请求发送到REST端点之前，数据在客户端进行了一些预处理。

我遇到的问题是，如何很好地接收这些已发送的数据并将其与一个与用户相关联的ID关联起来，因为我无法在合理的时间内快速处理数据并将其返回给用户？

我承认，这个问题有点模糊，而且涉及到个人观点。我只是需要一个正确方向的推动来继续前进。我一直在考虑的方法是将数据放入消息队列，然后在其他地方使用一些工作进程处理数据，当数据处理完成时，通过某种链接向用户提供数据的位置，可以是指向S3存储桶的链接或者文件的URL。另一个想法是针对每个要处理的项目运行请求，这个请求已经在客户端以某种循环方式处理单个记录。但是这个想法存在以下问题：

处理数据可能需要30分钟到2小时不等，这取决于他们想要处理的数据量。让他们坐在那里等待处理完成并不理想，所以我基本上排除了这个想法。

非常感谢任何指导，因为我没有同事可以讨论问题，也没有太多了解该领域的人可以自由请教。如果这不是询问这个问题的正确地方，你能告诉我正确的地方吗？

Chris

英文:

I need some guidance on how to properly build out a system that will be able to scale. I will give you some information about what I am trying to do and then ask my specific question.

I have a site where I want visitors to send some data to be processed. They input the data into a textarea or upload it in a file. Simple. The data is somewhat preprocessed on the client side before a POST request is made to a REST endpoint.

What I am stuck on is what is a good way to take this posted data store it and then associate an id with it that references the user since I cannot process the data fast enough for it to be returned to the user in a reasonable amount of time?

This question is a bit vague and open to opinion, I admit it. I just need a push in the right direction to keep moving. What I have been considering is throwing the data into a message queue and then having some workers process the data elsewhere and when the data is processed alert the user as to where to find it with some sort of link to an S3 bucket or just a URL to a file. The other idea was to just run the request for each item to be processed against another end-point that already processes individual records in some sort of loop client side. The problem is as follows with this idea:

To process the data it may take somewhere from 30 minutes to 2 hours depending upon the amount that they want processed. It's not ideal for them to just sit there and wait for that to finish depending on the amount of records they need processed, so I have ruled this out mostly.

Any guidance would be very much appreciated as I don't have any coworkers to bounce things off of, nor do I know many people with the domain knowledge that I could freely ask. If this isn't the right place to ask this, could you point me in the right direction as to where it should be asked?

Chris

答案1

得分: 4

如果我理解正确，你的流程如下：

接受用户输入的项目
可能进行预处理/验证
将项目放入某个队列
处理数据
返回结果

在第三个阶段，你可以使用一个或多个队列。来自用户的实体将被添加到其中一个队列中。如果实体足够大，可以将其存储在S3或类似的存储中，只将相关信息放入队列中：链接、添加日期、用户ID（或类似的电子邮件）。处理器可以从队列中提取项目并向用户提供反馈。

如果对顺序没有严格要求，事情会变得简单得多：你不需要在它们之间进行任何同步。将所有组件（上传接收器、队列、存储和处理器）视为独立的进程池。分别监控每个进程池。如果存在瓶颈，可以向该进程池添加机器。

英文:

If I've got you right, your pipeline is:

Accept item from user
Possibly preprocess/validate it (?)
Put into some queue
Process data
Return result.

You man use one or several queues on stage (3). Entity from user gets added to one of the queues. If it's big enough, it could be stored in S3 or storage alike, and only info about it put into the queue: link, add date, user id (or email of alike). Processors can pull items from queue and give feedback to users.

If you have no strict requirements on order, things get much simpler: you don't need any sync between them. Treat all the components: upload acceptors, queues, storages and processors as independent pools of processes. Monitor each pool separately. If there's some bottlenecks - add machines to that pool.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我被困在理解如何构建可扩展系统的问题上。

问题

答案1

gin-gonic Redirecting HTML following of POST, from Javascript XMLHttpRequest

Golang – MySQL驱动程序 – 数据库函数

Type variables in Go

可以使用gocolly爬取CSR网站吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。