根据查询字符串内容进行.htaccess重定向

huangapple go评论52阅读模式
英文:

.htaccess redirects according to the query string contents

问题

我有一个在线图像存档,其中一些图像存储在云存储上。存档是分层的,有四个级别,可以使用查询字符串访问适当的级别:

a.php?level=image&collection=a&document=b&item=72

级别可以是archive(存档),collection(集合),document(文档)或image(图像)。

我想要阻止机器人访问实际的图像,主要是为了减少云存储上的流量。所以我的想法是,如果他们发出的请求中查询字符串的级别参数是image(“?level=image”),那么该请求会被重定向。

下面的.htaccess代码旨在检查来自外部引用的请求的查询字符串,如果请求是获取图像,就将请求重定向到其他地方:

  RewriteEngine On
  RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://
  RewriteCond %{QUERY_STRING} ^level=image$
  RewriteRule (.*) https://a.co.uk/blank.htm [NC,R,L]
英文:

I have an online archive of images, some of which reside on Cloud Storage. The archive is hierarchical with four levels, and the appropriate level is accessed using query strings:

a.php?level=image&collection=a&document=b&item=72

The level can be archive, collection, document, or image.

I want to prevent robots from accessing the actual images, primarily to minimise traffic on the cloud storage. So the idea is if they issue a request where the query string level parameter is image ("?level=image"), that request is diverted.

The .htaccess code below is intended to check the query string for a request from a foreign referrer, and if the request is for an image, direct the request elsewhere:

  RewriteEngine On
  RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://
  RewriteCond %{QUERY_STRING} ^level=image$
  RewriteRule (.*) https://a.co.uk/blank.htm [NC,R,L]

My code appears to have no obvious effect. Can anybody see what I am doing wrong? I do not pretend to have a lot of confidence with .htaccess code, normally relying on snippets produced by people cleverer than me.

答案1

得分: 0

> RewriteCond %{QUERY_STRING} ^level=image$

这检查查询字符串是否完全等于 level=image,而在您的示例中,level URL 参数只是其中之一(第一个参数)。

要检查 URL 参数 level=image 是否出现在查询字符串的任何位置,请将上述 条件 修改为以下内容:

RewriteCond %{QUERY_STRING} (^|&)level=image($|&)

> RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1

这是一个小问题,但这会允许引荐者的请求主机名(例如 example.com)只作为引荐者的子域名出现。例如 example.com.referrer.com。为了解决这个问题,修改 CondPattern 以包括一个尾部斜杠或字符串结束锚点。例如:

RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://(/|$)

> RewriteRule (.*) https://a.co.uk/blank.htm [NC,R,L]

不需要捕获子模式。如果您只需要该规则对任何 URL 路径成功匹配,请使用 ^ 来避免遍历 URL 路径。但是在您的示例中,请求的是 a.php,而不是"任何 URL"?但为什么是"重定向",而不是简单地阻止请求呢?毕竟,这是为了"机器人"。例如,发送 403 Forbidden:

RewriteRule ^a\.php$ - [F]

总结:

RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://(/|$)
RewriteCond %{QUERY_STRING} (^|&)level=image($|&)
RewriteRule ^a\.php$ - [F]

但是,请注意,搜索引擎"机器人"通常根本不发送 Referer 头部。而且,对于任意机器人来说,伪造 Referer 头部并规避您的阻止是轻而易举的。

英文:

> RewriteCond %{QUERY_STRING} ^level=image$

This checks that the query string is exactly equal to level=image, whereas in your example the level URL parameter is just one of many (the first one).

To check that the URL parameter level=image appears anywhere in the query string then modify the above condition like so:

RewriteCond %{QUERY_STRING} (^|&)level=image($|&)

> RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1

Minor issue, but this would allow referrers where the requested hostname (eg. example.com) occurs only as a subdomain of the referrer. eg. example.com.referrer.com. To resolve this, modify the CondPattern to include a trailing slash or end-of-string anchor. For example:

RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://(/|$)

> RewriteRule (.*) https://a.co.uk/blank.htm [NC,R,L]

There's no need for the capturing subpattern. If you only need the rule to be successful for any URL-path then use just ^ to avoid traversing the URL-path. But in your example, the request is for a.php, not "any URL"?

But why "redirect", rather than simply block the request? As you say, this is for "robots" after all. For example, to send a 403 Forbidden:

RewriteRule ^a\.php$ - [F]

In summary:

RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://(/|$)
RewriteCond %{QUERY_STRING} (^|&)level=image($|&)
RewriteRule ^a\.php$ - [F]

Note, however, that search engine "bots" generally don't send a Referer header at all. And it is trivial for arbitrary bots to fake the Referer header and circumvent your block.

huangapple
  • 本文由 发表于 2023年2月7日 00:38:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364129.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定