英文:
.htaccess Ban bot only on url with params
问题
Google正在访问带有参数的页面,我需要将其阻止。
在所有带参数的页面上显示404页面
看起来像site.com?q=text或site.com/?q=text
但是不要阻止链接只是site.com
我为.htaccess编写了这个脚本
ErrorDocument 403 "Your connection was rejected"
ErrorDocument 404 /404.shtml
RewriteEngine On
#RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{REQUEST_URI} ^/q= [NC]
RewriteRule ^ - [F,L]
但有两个问题
第一 - 如何设置参数
第二 - 当它们被阻止时,不显示404页面,而显示
Not Found
The requested URL was not found on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
但我提供了ErrorDocument 404 /404.shtml。
为什么Apache找不到404.shtml?
如果我提供一个丢失的页面,404.shtml会正常显示。
英文:
Google is visiting the page with parameters, i need to block it.
Give page 404 on all pages with param
Look like site.com?q=text or site.com/?q=text
but not block if link just site.com
I wrote this script for .htaccess
ErrorDocument 403 "Your connection was rejected"
ErrorDocument 404 /404.shtml
RewriteEngine On
#RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{REQUEST_URI} ^/q= [NC]
RewriteRule ^ - [F,L]
But have 2 problems
First - How to set params
And second - when they blocked not showing 404 page and show
Not Found
The requested URL was not found on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
But i give ErrorDocument 404 /404.shtml.
Why apache not found 404.shtml?
If I give a missing page , it is displayed normally 404.shtml.
答案1
得分: 1
首先,您需要使用 QUERY_STRING
而不是 REQUEST_URI
来匹配查询字符串。
此外,您之所以收到此错误是因为在重定向后的 URL 中没有获取查询字符串,即在 404
重定向之后的 /404.shtml?q=text
URL 中,您的规则将再次尝试重定向到相同的 URL。
理想情况下,您应该像这样返回 403
禁止:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule ^ - [F]
然而,如果您必须只使用 404
,那么可以像这样使用:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule !^404\.shtml$ - [R=404,NC,L]
这将对除了 /404.shtml
之外的所有 URL 执行此规则。
您还可以像这样检查 REDIRECT_STATUS
:
RewriteEngine On
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule ^ - [R=404,L]
这将仅对原始 URL 执行此规则。
英文:
First this is that you need to use QUERY_STRING
not the REQUEST_URI
to match query string.
Moreover, you are getting this error because query string is not getting in the redirected URL i.e. /404.shtml?q=text
after 404
redirect and you rule will again try to redirect to same URL.
Ideally you should be returning 403
forbidden like this:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule ^ - [F]
However if you have to use 404
only then use it like this:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule !^404\.shtml$ - [R=404,NC,L]
Which will execute this rule for all URLs except for /404.shtml
.
You may also check for REDIRECT_STATUS
like this:
RewriteEngine On
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{HTTP_USER_AGENT} (Googlebot) [NC]
RewriteCond %{QUERY_STRING} ^q= [NC]
RewriteRule ^ - [R=404,L]
This will execute this rule for original URL only.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论