Robots.txt文件和Googlebot的可爬性。

huangapple go评论113阅读模式
英文:

Robots.txt file and Googlebot crawability

问题

这个robots.txt文件会允许Googlebot爬取我的网站吗?

英文:

Will this robots.txt allow Googlebot to crawl my site or not?

Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: *
Disallow:
Disallow: /cgi-bin/
Sitemap: https://koyal.pk/sitemap/sitemap.xml

答案1

得分: 1

如果您想知道Google如何对待robots.txt文件,您应该通过在Google的robots.txt测试工具中进行测试来获取官方答案。这里我有使用您提供的robots.txt进行测试的结果:

Robots.txt文件和Googlebot的可爬性。

Googlebot将能够爬取该站点,但Google告诉您,您正在使用的robots.txt语法存在问题。我看到了一些问题:

  • Disallow指令 必须 在其上方有一个User-agent指令。
  • 在每个User-agent指令之前应该有一个新行(除了文件开头的那个指令)。
  • Disallow:行意味着“允许所有爬取”。只有在没有其他Disallow规则时才应使用它。

我认为符合语法规则的robots.txt文件,可以实现您的意图如下:

User-agent: Robozilla
Disallow: /

User-agent: *
Disallow: /cgi-bin/
Sitemap: https://koyal.pk/sitemap/sitemap.xml

这将阻止Robozilla机器人爬取,同时允许所有其他机器人(包括Googlebot)爬取除/cgi-bin/目录之外的所有内容。

英文:

If you want to know how Google will react to a robots.txt file, you should get an official answer by testing in Google's robots.txt testing tool. Here I have the results of such a test using the robots.txt that you provided:

Robots.txt文件和Googlebot的可爬性。

Googlebot will be able to crawl the site, however Google tells you that the robots.txt syntax you are using as a problem. I see several problems:

  • A Disallow directive must have a User-agent directive somewhere above it.
  • There should be a new line before each User-agent directive (except the one at the beginning of the file.)
  • The Disallow: line means "allow all crawling". That should only be used if there are no other Disallow rules.

A syntactically correct robots.txt that I think would do what you intend is:

User-agent: Robozilla
Disallow: /

User-agent: *
Disallow: /cgi-bin/
Sitemap: https://koyal.pk/sitemap/sitemap.xml

That would provent the Robozilla bot from crawling while allowing all other bots (including Googlebot) to crawl everything except the /cgi-bin/ directory.

huangapple
  • 本文由 发表于 2023年1月9日 17:22:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75055219.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定