问题

这个robots.txt文件会允许Googlebot爬取我的网站吗？

英文:

Will this robots.txt allow Googlebot to crawl my site or not?

Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: *
Disallow:
Disallow: /cgi-bin/
Sitemap: https://koyal.pk/sitemap/sitemap.xml

答案1

得分: 1

如果您想知道Google如何对待robots.txt文件，您应该通过在Google的robots.txt测试工具中进行测试来获取官方答案。这里我有使用您提供的robots.txt进行测试的结果：

Googlebot将能够爬取该站点，但Google告诉您，您正在使用的robots.txt语法存在问题。我看到了一些问题：

Disallow指令必须在其上方有一个User-agent指令。
在每个User-agent指令之前应该有一个新行（除了文件开头的那个指令）。
Disallow:行意味着“允许所有爬取”。只有在没有其他Disallow规则时才应使用它。

我认为符合语法规则的robots.txt文件，可以实现您的意图如下：

User-agent: Robozilla
Disallow: /

User-agent: *
Disallow: /cgi-bin/
Sitemap: https://koyal.pk/sitemap/sitemap.xml

这将阻止Robozilla机器人爬取，同时允许所有其他机器人（包括Googlebot）爬取除/cgi-bin/目录之外的所有内容。

英文:

If you want to know how Google will react to a robots.txt file, you should get an official answer by testing in Google's robots.txt testing tool. Here I have the results of such a test using the robots.txt that you provided:

Googlebot will be able to crawl the site, however Google tells you that the robots.txt syntax you are using as a problem. I see several problems:

A Disallow directive must have a User-agent directive somewhere above it.
There should be a new line before each User-agent directive (except the one at the beginning of the file.)
The Disallow: line means "allow all crawling". That should only be used if there are no other Disallow rules.

A syntactically correct robots.txt that I think would do what you intend is:

User-agent: Robozilla
Disallow: /

User-agent: *
Disallow: /cgi-bin/
Sitemap: https://koyal.pk/sitemap/sitemap.xml

That would provent the Robozilla bot from crawling while allowing all other bots (including Googlebot) to crawl everything except the /cgi-bin/ directory.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Robots.txt文件和Googlebot的可爬性。

问题

答案1

‘7’ 运行时错误 – 没有找到元素错误 – 未找到元素

问题与数据爬取有关。

How to efficiently store html response to a file in golang

Go语言中的网络爬虫

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论