禁止只在带有参数的URL上的机器人的.htaccess

huangapple go评论79阅读模式
英文:

.htaccess Ban bot only on url with params

问题

Google正在访问带有参数的页面,我需要将其阻止。

在所有带参数的页面上显示404页面

看起来像site.com?q=text或site.com/?q=text

但是不要阻止链接只是site.com

我为.htaccess编写了这个脚本

ErrorDocument 403 "Your connection was rejected"
ErrorDocument 404 /404.shtml

RewriteEngine On
#RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{REQUEST_URI} ^/q= [NC]
RewriteRule ^ - [F,L]

但有两个问题
第一 - 如何设置参数

第二 - 当它们被阻止时,不显示404页面,而显示

Not Found
The requested URL was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

但我提供了ErrorDocument 404 /404.shtml。
为什么Apache找不到404.shtml?
如果我提供一个丢失的页面,404.shtml会正常显示。

英文:

Google is visiting the page with parameters, i need to block it.

Give page 404 on all pages with param
Look like site.com?q=text or site.com/?q=text

but not block if link just site.com

I wrote this script for .htaccess

ErrorDocument 403 "Your connection was rejected"
ErrorDocument 404 /404.shtml


RewriteEngine On
#RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{REQUEST_URI} ^/q= [NC]
RewriteRule ^ - [F,L]

But have 2 problems
First - How to set params

And second - when they blocked not showing 404 page and show

Not Found
The requested URL was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

But i give ErrorDocument 404 /404.shtml.
Why apache not found 404.shtml?
If I give a missing page , it is displayed normally 404.shtml.

答案1

得分: 1

首先,您需要使用 QUERY_STRING 而不是 REQUEST_URI 来匹配查询字符串。

此外,您之所以收到此错误是因为在重定向后的 URL 中没有获取查询字符串,即在 404 重定向之后的 /404.shtml?q=text URL 中,您的规则将再次尝试重定向到相同的 URL。

理想情况下,您应该像这样返回 403 禁止:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule ^ - [F]

然而,如果您必须只使用 404,那么可以像这样使用:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule !^404\.shtml$ - [R=404,NC,L]

这将对除了 /404.shtml 之外的所有 URL 执行此规则。

您还可以像这样检查 REDIRECT_STATUS

RewriteEngine On

RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule ^ - [R=404,L]

这将仅对原始 URL 执行此规则。

英文:

First this is that you need to use QUERY_STRING not the REQUEST_URI to match query string.

Moreover, you are getting this error because query string is not getting in the redirected URL i.e. /404.shtml?q=text after 404 redirect and you rule will again try to redirect to same URL.

Ideally you should be returning 403 forbidden like this:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule ^ - [F]

However if you have to use 404 only then use it like this:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule !^404\.shtml$ - [R=404,NC,L]

Which will execute this rule for all URLs except for /404.shtml.

You may also check for REDIRECT_STATUS like this:

RewriteEngine On

RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule ^ - [R=404,L]

This will execute this rule for original URL only.

huangapple
  • 本文由 发表于 2023年7月28日 01:01:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76781981.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定