2023年5月7日 13:56:25go评论111阅读模式

英文:

Using robots.txt to exclude one specific user-agent and allowing all others?

问题

这是翻译好的内容：

"这似乎是一个简单的问题。排除 waybackmachine 爬虫（ia_archiver），允许所有其他用户代理。

所以，我设置了 robots.txt 如下：

User-agent: *

Sitemap: https://www.example.com/sitemap.xml


User-agent: ia_archiver
Disallow: /

半年后，我注意到我的网站访客数量大幅下降。

过了一会儿，我意识到 Google Bot 停止索引我的网站。

通过他们的 robots.txt 验证器进行了确认：

Disallow: / 部分也被 Google Bot 捕捉到了，不仅仅是 ia_archiver 被阻止。

显而易见的问题是：

这个 robots.txt 有什么问题吗？

是顺序的问题吗？"

英文:

It sounds like a simple question. Exclude the waybackmachine crawler (ia_archiver) and allow all other user agents.

So I setup the robots.txt as follows:

User-agent: *

Sitemap: https://www.example.com/sitemap.xml


User-agent: ia_archiver
Disallow: /

After half a year I noticed that the visitor count to my site dropped tremendously.

After a while I realized that Google Bot stopped indexing my site.

Confirmed by their robots.txt verifier:

The Disallow: / part is picked up by google bot too, not only ia_archiver is blocked.

The obvious question is:

What is wrong with this robots.txt?

Is the order of the entries the culprit?

答案1

得分: 1

以下是翻译好的内容：

解决方案：

```plaintext
User-agent: ia_archiver
Disallow: /

User-agent: *
Disallow: 

Sitemap: https://www.example.com/sitemap.xml

ia_archiver 必须放在第一位。

空的 Disallow: 允许所有其他用户代理程序爬取网站。


<details>
<summary>英文:</summary>

The solution:

User-agent: ia_archiver
Disallow: /

User-agent: *
Disallow:

Sitemap: https://www.example.com/sitemap.xml


`ia_archiver` must come first.

The empty `Disallow:` allows all other user agents to crawl the site.

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用robots.txt来排除一个特定的用户代理，允许所有其他用户代理。

问题

答案1

Robots.txt文件和Googlebot的可爬性。

Robots.txt – 阻止搜索引擎爬虫在WooCommerce中添加到购物车。

我如何在LeetCode上抓取我的正确提交？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论