我被困在理解如何构建可扩展系统的问题上。

huangapple go评论69阅读模式
英文:

Stuck with understanding how to build a scalable system

问题

我需要一些关于如何构建一个能够扩展的系统的指导。我会给你一些关于我想做的事情的信息,然后问我的具体问题。

我有一个网站,我希望访问者可以发送一些数据进行处理。他们可以将数据输入到文本区域或者上传到文件中。很简单。在将数据作为POST请求发送到REST端点之前,数据在客户端进行了一些预处理。

我遇到的问题是,如何很好地接收这些已发送的数据并将其与一个与用户相关联的ID关联起来,因为我无法在合理的时间内快速处理数据并将其返回给用户?

我承认,这个问题有点模糊,而且涉及到个人观点。我只是需要一个正确方向的推动来继续前进。我一直在考虑的方法是将数据放入消息队列,然后在其他地方使用一些工作进程处理数据,当数据处理完成时,通过某种链接向用户提供数据的位置,可以是指向S3存储桶的链接或者文件的URL。另一个想法是针对每个要处理的项目运行请求,这个请求已经在客户端以某种循环方式处理单个记录。但是这个想法存在以下问题:

处理数据可能需要30分钟到2小时不等,这取决于他们想要处理的数据量。让他们坐在那里等待处理完成并不理想,所以我基本上排除了这个想法。

非常感谢任何指导,因为我没有同事可以讨论问题,也没有太多了解该领域的人可以自由请教。如果这不是询问这个问题的正确地方,你能告诉我正确的地方吗?

Chris

英文:

I need some guidance on how to properly build out a system that will be able to scale. I will give you some information about what I am trying to do and then ask my specific question.

I have a site where I want visitors to send some data to be processed. They input the data into a textarea or upload it in a file. Simple. The data is somewhat preprocessed on the client side before a POST request is made to a REST endpoint.

What I am stuck on is what is a good way to take this posted data store it and then associate an id with it that references the user since I cannot process the data fast enough for it to be returned to the user in a reasonable amount of time?

This question is a bit vague and open to opinion, I admit it. I just need a push in the right direction to keep moving. What I have been considering is throwing the data into a message queue and then having some workers process the data elsewhere and when the data is processed alert the user as to where to find it with some sort of link to an S3 bucket or just a URL to a file. The other idea was to just run the request for each item to be processed against another end-point that already processes individual records in some sort of loop client side. The problem is as follows with this idea:

To process the data it may take somewhere from 30 minutes to 2 hours depending upon the amount that they want processed. It's not ideal for them to just sit there and wait for that to finish depending on the amount of records they need processed, so I have ruled this out mostly.

Any guidance would be very much appreciated as I don't have any coworkers to bounce things off of, nor do I know many people with the domain knowledge that I could freely ask. If this isn't the right place to ask this, could you point me in the right direction as to where it should be asked?

Chris

答案1

得分: 4

如果我理解正确,你的流程如下:

  1. 接受用户输入的项目
  2. 可能进行预处理/验证
  3. 将项目放入某个队列
  4. 处理数据
  5. 返回结果

在第三个阶段,你可以使用一个或多个队列。来自用户的实体将被添加到其中一个队列中。如果实体足够大,可以将其存储在S3或类似的存储中,只将相关信息放入队列中:链接、添加日期、用户ID(或类似的电子邮件)。处理器可以从队列中提取项目并向用户提供反馈。

如果对顺序没有严格要求,事情会变得简单得多:你不需要在它们之间进行任何同步。将所有组件(上传接收器、队列、存储和处理器)视为独立的进程池。分别监控每个进程池。如果存在瓶颈,可以向该进程池添加机器。

英文:

If I've got you right, your pipeline is:

  1. Accept item from user

  2. Possibly preprocess/validate it (?)

  3. Put into some queue

  4. Process data

  5. Return result.

You man use one or several queues on stage (3). Entity from user gets added to one of the queues. If it's big enough, it could be stored in S3 or storage alike, and only info about it put into the queue: link, add date, user id (or email of alike). Processors can pull items from queue and give feedback to users.

If you have no strict requirements on order, things get much simpler: you don't need any sync between them. Treat all the components: upload acceptors, queues, storages and processors as independent pools of processes. Monitor each pool separately. If there's some bottlenecks - add machines to that pool.

huangapple
  • 本文由 发表于 2017年9月15日 13:55:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/46232601.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定