英文:
How does google process 600K documents in .33 seconds?
问题
不管他们的CPU有多快,似乎在0.33秒内处理那么多文档是不可能的。
所以我认为问题关键在于水平扩展。猜测一下,在这个查询中,有多少台服务器参与处理60万份文档,而且能在一秒内完成?
英文:
Regardless of fast their CPUs are,it seems impossible to process that many documents in .33 seconds.
So I believe that it comes down to horizontal scaling. As a guess, how many servers were involved with this query that process 600k documents in under a second?
答案1
得分: 1
Google不会那么快地处理那么多文档。 Google在您进行搜索之前会对文档进行预处理。 Google维护着一个被用来生成搜索结果列表的“搜索索引”。
您可以将搜索索引看作是纸质书中的目录。对于每个词,它都会告诉您互联网上使用该词的页面。对于查询,它会在搜索索引中查找您查询中的每个词,并创建一个结果列表。
供参考:什么是搜索索引,它是如何工作的?- AddSearch
Google还拥有大量的计算机,并进行了大量的水平扩展。它在构建搜索索引和显示搜索结果的每个阶段都进行了水平扩展:
- 爬取(Googlebot是一个水平分布的网络爬虫)
- 相关性(确定每个词对页面的重要性)
- 索引(创建搜索索引)
- 声誉(计算每个站点和每个页面应该有多可信)
- 垃圾邮件和欺诈检测(决定不应该包含在索引中的内容)
- 查询(针对搜索索引)
但是无论进行多少水平扩展,搜索引擎都无法根据您的搜索查询实时处理文档。
英文:
Google doesn't process that many documents that quickly. Google pre-processes the documents well before you do your search. Google maintains a "search index" that is used to produce the list of search results.
You can think of a search index like the index in a paper book. For each word, it says what pages on the internet use it. For a query, it looks up each of the words in your query in the search index and creates a list of results from that.
For reference: What Is A Search Index And How Does It Work? - AddSearch
Google also has a lot of computers and does a ton of horizontal scaling. It has horizontal scaling for each of the stages of building the search index and displaying search results:
- Crawling (Googlebot is a horizontally distributed web crawler)
- Relevancy (Deciding how important each word is to the page)
- Indexing (Creating the search index)
- Reputation (Calculating how trusted each site and each page should be)
- Spam and fraud detection (deciding what shouldn't be in the index)
- Queries (against the search index)
But there is no amount of horizontal scaling that would allow search engines to process documents in real time based on your search query.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论