问题

我对StormCrawler还不太了解——在浏览文档以及自述文件和其他资源时，我注意到经常提到一个名为**“URL数据库”**的东西，它应该处理有关爬虫运行过程中URL的信息（例如这里）。

然而，我没有找到任何关于这个数据库是什么类型的信息，也没有找到如何自定义它或者用自定义模块替换它的方法。我一直在跟踪代码，找到了IOOutputController，其中有一些相当令人困惑的方法，缺乏文档字符串，实际上甚至很难确定负责处理这一部分的类。

如果能提供任何指导，我将非常感激！

谢谢您的时间，Matyáš

英文:

I am quite new to StormCrawler - as I have been exploring the documentation, as well as the READMEs and additional resources, I have noticed that it is often referred to a "URL database" which should handle storing information concerning the the URLs from the run of the crawler (for example here).

I have, however, not found anywhere of what type this database is, nor how to customize it or replace it with custom modules. I have been following the code and got to IOOutputController, which has some quite confusing methods and with the lack of docstrings, it is quite challenging to actually even determine the class responsible for handling this.

I would be very grateful for any guidance!

Thank you for your time, Matyáš

答案1

得分: 0

以下是翻译好的内容：

在StormCrawler中，最常用于存储URL的方式是Elasticsearch。这在教程中有所说明1。还有其他可用的方式，比如SQL或SOLR，详见2；StormCrawler并不局限于特定的数据库。
在大多数情况下，人们只需使用现有的后端实现，比如Elasticsearch的实现。

英文:

The most commonly used storage for the URLs in StormCrawler is Elasticsearch. This is illustrated in the tutorials. There are other ones available such as SQL or SOLR, see enter link description here; StormCrawler is not limited to a specific database.
In most cases, people just use an existing backend implementation such as the Elasticsearch one.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

StormCrawler：URL数据库规格

问题

答案1

方法没有被正确调用?! 卡路里应用程序

java.lang.IllegalStateException: 在使用列表时，流已被操作或关闭。

线程之间的非阻塞信号传递

没有找到合适的方法用于 thenReturn。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论