英文:
.htaccess redirects according to the query string contents
问题
我有一个在线图像存档,其中一些图像存储在云存储上。存档是分层的,有四个级别,可以使用查询字符串访问适当的级别:
a.php?level=image&collection=a&document=b&item=72
级别可以是archive(存档),collection(集合),document(文档)或image(图像)。
我想要阻止机器人访问实际的图像,主要是为了减少云存储上的流量。所以我的想法是,如果他们发出的请求中查询字符串的级别参数是image(“?level=image”),那么该请求会被重定向。
下面的.htaccess
代码旨在检查来自外部引用的请求的查询字符串,如果请求是获取图像,就将请求重定向到其他地方:
RewriteEngine On
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://
RewriteCond %{QUERY_STRING} ^level=image$
RewriteRule (.*) https://a.co.uk/blank.htm [NC,R,L]
英文:
I have an online archive of images, some of which reside on Cloud Storage. The archive is hierarchical with four levels, and the appropriate level is accessed using query strings:
a.php?level=image&collection=a&document=b&item=72
The level can be archive, collection, document, or image.
I want to prevent robots from accessing the actual images, primarily to minimise traffic on the cloud storage. So the idea is if they issue a request where the query string level parameter is image ("?level=image"), that request is diverted.
The .htaccess
code below is intended to check the query string for a request from a foreign referrer, and if the request is for an image, direct the request elsewhere:
RewriteEngine On
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://
RewriteCond %{QUERY_STRING} ^level=image$
RewriteRule (.*) https://a.co.uk/blank.htm [NC,R,L]
My code appears to have no obvious effect. Can anybody see what I am doing wrong? I do not pretend to have a lot of confidence with .htaccess
code, normally relying on snippets produced by people cleverer than me.
答案1
得分: 0
> RewriteCond %{QUERY_STRING} ^level=image$
这检查查询字符串是否完全等于 level=image
,而在您的示例中,level
URL 参数只是其中之一(第一个参数)。
要检查 URL 参数 level=image
是否出现在查询字符串的任何位置,请将上述 条件 修改为以下内容:
RewriteCond %{QUERY_STRING} (^|&)level=image($|&)
> RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1
这是一个小问题,但这会允许引荐者的请求主机名(例如 example.com
)只作为引荐者的子域名出现。例如 example.com.referrer.com
。为了解决这个问题,修改 CondPattern 以包括一个尾部斜杠或字符串结束锚点。例如:
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://(/|$)
> RewriteRule (.*) https://a.co.uk/blank.htm [NC,R,L]
不需要捕获子模式。如果您只需要该规则对任何 URL 路径成功匹配,请使用 ^
来避免遍历 URL 路径。但是在您的示例中,请求的是 a.php
,而不是"任何 URL"?但为什么是"重定向",而不是简单地阻止请求呢?毕竟,这是为了"机器人"。例如,发送 403 Forbidden:
RewriteRule ^a\.php$ - [F]
总结:
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://(/|$)
RewriteCond %{QUERY_STRING} (^|&)level=image($|&)
RewriteRule ^a\.php$ - [F]
但是,请注意,搜索引擎"机器人"通常根本不发送 Referer
头部。而且,对于任意机器人来说,伪造 Referer
头部并规避您的阻止是轻而易举的。
英文:
> RewriteCond %{QUERY_STRING} ^level=image$
This checks that the query string is exactly equal to level=image
, whereas in your example the level
URL parameter is just one of many (the first one).
To check that the URL parameter level=image
appears anywhere in the query string then modify the above condition like so:
RewriteCond %{QUERY_STRING} (^|&)level=image($|&)
> RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1
Minor issue, but this would allow referrers where the requested hostname (eg. example.com
) occurs only as a subdomain of the referrer. eg. example.com.referrer.com
. To resolve this, modify the CondPattern to include a trailing slash or end-of-string anchor. For example:
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://(/|$)
> RewriteRule (.*) https://a.co.uk/blank.htm [NC,R,L]
There's no need for the capturing subpattern. If you only need the rule to be successful for any URL-path then use just ^
to avoid traversing the URL-path. But in your example, the request is for a.php
, not "any URL"?
But why "redirect", rather than simply block the request? As you say, this is for "robots" after all. For example, to send a 403 Forbidden:
RewriteRule ^a\.php$ - [F]
In summary:
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://(/|$)
RewriteCond %{QUERY_STRING} (^|&)level=image($|&)
RewriteRule ^a\.php$ - [F]
Note, however, that search engine "bots" generally don't send a Referer
header at all. And it is trivial for arbitrary bots to fake the Referer
header and circumvent your block.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论