AH10411错误: 在Apache mod_rewrite中管理空格和%20

huangapple go评论63阅读模式
英文:

AH10411 error: Managing spaces and %20 in apache mod_rewrite

问题

我已更新了Apache(至2.4.56-1),以前有效的大量.htaccess重写现在出现了AH10411错误,与查询中的空格有关。我在寻找一个“合适”的解决方案。

用户点击链接,例如<a href='FISH%20J12345.6-78919'>clickme</a> - 正如您所看到的,链接URL中的空格已被编码为%20

相关服务器目录中的.htaccess文件包含并执行以下相关指令:

RewriteRule ^(FISH\s*J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1 [L,QSA]

(在上述示例中,我检查的是空格,而不是%20,因为浏览器似乎会在传递到此规则之前将其转换为空格)。

这在我更新Apache之前是有效的;现在用户会收到403错误,并且我的Apache错误日志报告:

AH10411: 重写的查询字符串包含控制字符或空格

这似乎是一个新错误,因为搜索引擎找不到相关信息!

编辑我的页面来(例如)将空格更改为下划线并正确处理它不是一个真正的选项,因为设计旨在支持用户能够直接输入他们关心的对象名称的URL。到目前为止,我找到的唯一解决办法有点丑陋,即在正则表达式中分别捕获源名称的两个部分,如下:

RewriteRule ^(FISH)\s*(J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1+$2 [L,QSA]
                  ^   ^                                               ^^^

(我尝试在末尾使用$1%20$2,结果也导致相同的错误)。

是否有更好的解决方案?即在URL中存在空格时,我应该如何“合理地”处理它,当它是我希望捕获并传递给底层页面的字符串参数时?

英文:

I have updated Apache today (to 2.4.56-1) and a load of .htaccess rewrites that used to work are now getting AH10411 errors, relating to spaces in the query. I'm struggling for a 'proper' solution.

The user clicks on a link such as &lt;a href=&#39;FISH%20J12345.6-78919&#39;&gt;clickme&lt;/a&gt; - as you can see the space in the link URL has been encoded as %20.

The .htaccess file in the relevant server directory contains and executes this relevant directive:

RewriteRule ^(FISH\s*J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1 [L,QSA]

(In the above I am checking for spaces, not %20, as the browser seems to be converting it to space before it makes it to this rule).

This was working until I updated Apache; now users get a 403 error, and my Apache error log reports:

> AH10411: Rewritten query string contains control characters or spaces

This appears to be a new error, because Googling it finds nothing!

Editing my pages to (for example) change the space to an underscore and handle it correctly is not really an option, as the design is intended to support users being able to enter a URL directly using the name of the object they care about. So far, the only workaround I've found is a bit ugly, namely capturing the two parts of the source name separately in the regexp, thus:

RewriteRule ^(FISH)\s*(J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1+$2 [L,QSA]
                  ^   ^                                               ^^^

(I tried $1%20$2 at the end, which also resulted in the same error.)

Is there a better solution for this? i.e. how am I "supposed" to handle the case of spaces in a URL, when it's in a string I want to capture and pass as an argument to the underlying page?

答案1

得分: 32

(I tried $1%20$2 at the end, which also went badly).

This looks like a bug. Encoding the space as %20 in the query string should be valid. You can also encode the space as + in the query string (as in your workaround).

In your original rule, Apache should be encoding the space (as %20) when making the internal rewrite (since a literal space is not valid in the URL). However, it would seem Apache is then baulking at the encoded space?!

You can also try using the B flag in your original rule. The B flag tells mod_rewrite to URL-encode the backreference before applying this to the substitution string. However, this would seem to be dependent on Apache encoding the space as + in the query string (as opposed to %20 which it would ordinarily do). Certainly in earlier versions of Apache, this would only have resulted in Apache encoding the space as %20 (not +), however, since version 2.4.26 Apache has introduced a new flag BNP (backrefnoplus) which explicitly informs Apache not to use a +, so you would think that by default, it would use a +. (Unfortunately I can't just test this myself at the moment.)

For example:

RewriteRule ^(FISH\s*J[\d.]+-?+?\d+)$ myPage.php?sourceName=$1 [B,QSA,L]

(Minor point... no need to backslash-escape the literal dot when used inside a regex character class. I also reduced the digit ranges to the shorthand \d.)

Aside: Can you have both - and + before the last set of digits. It looks like it should perhaps be one or the other (or nothing at all)? eg. [-+]?.

Is there a better solution for this? i.e. how am I "supposed" to handle the case of spaces in a URL, when it's in a string I want to capture and pass as an argument to the underlying page?

Not really (although your solution is not strictly correct - see below). In your particular example, that only contains spaces you shouldn't need to do anything, as mod_rewrite should automatically URL-encode any URL that is not valid. (There is an NE - noescape - flag to explicitly prevent mod_rewrite from doing this - which is sometimes necessary to prevent already encoded characters being doubly encoded.) You can always use the B flag in URL-rewrites of this form (as mentioned above). You would need to use the B flag if there were other special characters, such as & (a special character in the query string) which would not otherwise be URL-encoded (effectively resulting in the URL parameter value being truncated).

RewriteRule ^(FISH)\s*(J[0-9.]+-?+?[0-9]+)$ myPage.php?sourceName=$1+$2 [L,QSA]

An issue with your solution is that you are allowing 0 (ie. "none") or more spaces in the request and enforcing a single space in the resulting URL parameter. This is not the same as your original directive, that would preserve the spaces (or lack of) from the original request.

Could there be 0 or more spaces in the initial request?

If yes, and these need to be preserved then it may just be easier to repeat this rule for as many "spaces" as you need. You could implement a search/replace, but that may be overkill.

(In the above I am checking for spaces, not %20, as the browser seems to be converting it to space before it makes it to this rule).

The URL-path that the RewriteRule pattern matches against is first URL-decoded (%-decoded), which is why you need to match against a literal space and not %20. This has nothing to do with the "browser". Any literal spaces in the URL-path "must" be URL-encoded as %20 in the HTTP request that leaves the browser/user-agent otherwise it's simply not valid.

(UPDATE) Restrict which non-alphanumeric characters are encoded

There was a comment (since deleted) where the user was also passing a + (literal plus) in the URL-path and seemingly expecting this to be passed as-is to the query string (via an internal rewrite) which would then be seen as an encoded space. However, the use of the B flag (as above) would result in the literal + being URL encoded as %2b thus preserving the literal + - which would ordinarily be the correct behaviour. However, if the + should be copied as-is and thus seen as an encoded space (not a literal +) in the resulting query string then you can restrict the non-alphanumeric characters that the B flag will encode (requires Apache 2.4.26+). ie. Exclude the +.

For instance, you could limit the encoding to spaces and ? only. For example:

RewriteRule ^(.+)$ index.php?query=$1 "[B= ?,L]"

  • will no longer be encoded in the backreference, so its special meaning in the query string (as an encoded space) will still apply.

NB: You can't encode only spaces (since a space cannot be used as the last character), hence the additional ? character. Consequently, the flags argument needs to be surrounded in double quotes, since spaces are otherwise argument delimiters.

Reference:

https://httpd.apache.org/docs/2.4/rewrite/flags.html#flag_b
https://stackoverflow.com/questions/75928324/mod-rewrite-b-flag-doesnt-work-with-spaces-ah10411-rewritten-query-string-co

英文:

> (I tried $1%20$2 at the end, which also went badly).

This looks like a bug. Encoding the space as %20 in the query string should be valid. You can also encode the space as + in the query string (as in your workaround).

In your original rule, Apache should be encoding the space (as %20) when making the internal rewrite (since a literal space is not valid in the URL). However, it would seem Apache is then baulking at the encoded space?!

You can also try using the B flag in your original rule. The B flag tells mod_rewrite to URL-encode the backreference before applying this to the substitution string. However, this would seem to be dependent on Apache encoding the space as + in the query string (as opposed to %20 which it would ordinarily do). Certainly in earlier versions of Apache, this would only have resulted in Apache encoding the space as %20 (not +), however, since version 2.4.26 Apache has introduced a new flag BNP (backrefnoplus) which explicitly informs Apache not to use a +, so you would think that by default, it would use a +. (Unfortunately I can't just test this myself at the moment.)

For example:

RewriteRule ^(FISH\s*J[\d.]+-?\+?\d+)$ myPage.php?sourceName=$1 [B,QSA,L]

(Minor point... no need to backslash-escape the literal dot when used inside a regex character class. I also reduced the digit ranges to the shorthand \d.)

Aside: Can you have both - and + before the last set of digits. It looks like it should perhaps be one or the other (or nothing at all)? eg. [-+]?.

> Is there a better solution for this? i.e. how am I "supposed" to handle the case of spaces in a URL, when it's in a string I want to capture and pass as an argument to the underlying page?

Not really (although your solution is not strictly correct - see below). In your particular example, that only contains spaces you shouldn't need to do anything, as mod_rewrite should automatically URL-encode any URL that is not valid. (There is an NE - noescape - flag to explicitly prevent mod_rewrite from doing this - which is sometimes necessary to prevent already encoded characters being doubly encoded.) You can always use the B flag in URL-rewrites of this form (as mentioned above). You would need to use the B flag if there were other special characters, such as &amp; (a special character in the query string) which would not otherwise be URL-encoded (effectively resulting in the URL parameter value being truncated).

> RewriteRule ^(FISH)\s*(J[0-9.]+-?+?[0-9]+)$ myPage.php?sourceName=$1+$2 [L,QSA]

An issue with your solution is that you are allowing 0 (ie. "none") or more spaces in the request and enforcing a single space in the resulting URL parameter. This is not the same as your original directive, that would preserve the spaces (or lack of) from the original request.

Could there be 0 or more spaces in the initial request?

If yes, and these need to be preserved then it may just be easier to repeat this rule for as many "spaces" as you need. You could implement a search/replace, but that may be overkill.

> (In the above I am checking for spaces, not %20, as the browser seems to be converting it to space before it makes it to this rule).

The URL-path that the RewriteRule pattern matches against is first URL-decoded (%-decoded), which is why you need to match against a literal space and not %20. This has nothing to do with the "browser". Any literal spaces in the URL-path "must" be URL-encoded as %20 in the HTTP request that leaves the browser/user-agent otherwise it's simply not valid.


(UPDATE) Restrict which non-alphanumeric characters are encoded

There was a comment (since deleted) where the user was also passing a + (literal plus) in the URL-path and seemingly expecting this to be passed as-is to the query string (via an internal rewrite) which would then be seen as an encoded space. However, the use of the B flag (as above) would result in the literal + being URL encoded as %2b thus preserving the literal + - which would ordinarily be the correct behaviour. However, if the + should be copied as-is and thus seen as an encoded space (not a literal +) in the resulting query string then you can restrict the non-alphanumeric characters that the B flag will encode (requires Apache 2.4.26+). ie. Exclude the +.

For instance, you could limit the encoding to spaces and ? only. For example:

RewriteRule ^(.+)$ index.php?query=$1 &quot;[B= ?,L]&quot;

+ will no longer be encoded in the backreference, so its special meaning in the query string (as an encoded space) will still apply.

NB: You can't encode only spaces (since a space cannot be used as the last character), hence the additional ? character. Consequently, the flags argument needs to be surrounded in double quotes, since spaces are otherwise argument delimiters.

Reference:

答案2

得分: 15

这是一个最近的安全修复

英文:

It's a recent security fix.

apache2 (2.4.52-1ubuntu4.4) jammy-security; urgency=medium

  * SECURITY UPDATE: HTTP request splitting with mod_rewrite and mod_proxy
    - debian/patches/CVE-2023-25690-1.patch: don&#39;t forward invalid query
      strings in modules/http2/mod_proxy_http2.c,
      modules/mappers/mod_rewrite.c, modules/proxy/mod_proxy_ajp.c,
      modules/proxy/mod_proxy_balancer.c, modules/proxy/mod_proxy_http.c,
      modules/proxy/mod_proxy_wstunnel.c.
    - debian/patches/CVE-2023-25690-2.patch: Fix missing APLOGNO in
      modules/http2/mod_proxy_http2.c.
    - CVE-2023-25690
  * SECURITY UPDATE: mod_proxy_uwsgi HTTP response splitting
    - debian/patches/CVE-2023-27522.patch: stricter backend HTTP response
      parsing/validation in modules/proxy/mod_proxy_uwsgi.c.
    - CVE-2023-27522

 -- Marc Deslauriers &lt;marc.deslauriers@ubuntu.com&gt;  Wed, 08 Mar 2023 12:32:01 -0500

答案3

得分: 0

以下是翻译好的内容:

Debugging Apache(带有LogLevel rewrite:trace6的ErrorLog)显示,调用

/FISH%20J12345.6-78919

RewriteRule ^(FISH\s*J[0-9.]+-?+?[0-9]+)$ myPage.php?sourceName=$1 [L,QSA]

在mod_rewrite获取之前,正确将%20解码为空格。URL被重写为

'myPage.php?sourceName=FISH J12345.6-78919'

查询参数中有一个空格,而mod_rewrite不喜欢这个(不再喜欢)。

实际上,使用mod_rewrite和规则

RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

首先解码URL的PATH部分(请注意,PATH部分中的+是+,而不是解码为空格),然后将其传递给mod_rewrite。然后将其放入$1中。原始URL的QUERY部分没有解码,而是合并到重写的PATH部分中。然后将新的URL返回给Apache。然后PHP解码QUERY参数。这导致PATH部分的双重解码,因为在重写的URL中,它是一个QUERY参数。

没有[B],例如/A%2520B/?a=b%2520c(%25解码为%)被重写为q=A%20B/&amp;a=b%2520c,最终以PHP为&quot;q&quot; =&gt; &quot;A B/&quot;,&quot;a&quot; =&gt; &quot;b%20c&quot;。实际上与一开始期望的不太一样(至少是我之前期望的,即`"q" => "A%20B/")。

因此,可能在大多数情况下,对于单个已编码的PATH部分,不使用[B]是可以的,因为在PATH部分中使用%符号不太常见。对于我来说,现在使用[B]是更好的选择,可以确保只解码一次。

使用[B],FISH链接会被编码成这样escaping backreference 'FISH J12345.6-78919' to 'FISH+J12345%2e6%2d78919',所以用+(而不是%20)来编码空格。在PHP中,它会再次解码。

我想,对于单个已编码的PATH部分,大多数情况下不使用[B]是可以的,这可能是因为在PATH部分中%符号不太常见。对我来说,现在使用[B]是更好的选择。

有一个注意事项,已在其他地方得到答复:由于+在PATH部分有效,/A+%2bB/被传递给mod_rewrite为A++B/(因此第一个+保持为+),最终传递为q=A%2b%2bB%2f,在PHP中为&quot;q&quot; =&gt; &quot;A++B/&quot;。这是无法克服的,因为+在PATH部分与QUERY部分不同。

英文:

Debugging Apache (ErrorLog with LogLevel rewrite:trace6) shows, that calling

/FISH%20J12345.6-78919

with

RewriteRule ^(FISH\s*J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1 [L,QSA]

decodes the %20 correctly to Space before mod_rewrite gets it. And the URL is rewritten to

&#39;myPage.php?sourceName=FISH J12345.6-78919&#39;

There is a Space in the query param and mod_rewrite does not like this (anymore).

Actually two things happen with mod_rewrite and a rule like

RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

First the PATH part of the URL is decoded (beware that a + in the PATH part is a +, and not decoded to a Space) and handed to mod_rewrite. This then is put to $1. The QUERY part of the original URL is not decoded, but merged into the rewritten PATH part. Then that new URL is handed back to Apache. The php then decodes the QUERY params. Which makes for a double decoding of the PATH part, as in the rewritten URL it is a QUERY param.

Without [B], e.g. /A%2520B/?a=b%2520c (%25 decoded is %) is rewritten to q=A%20B/&amp;a=b%2520c ending up in php as &quot;q&quot; =&gt; &quot;A B/&quot;, &quot;a&quot; =&gt; &quot;b%20c&quot;. Actually not quite what is expected at first sight (at least what I expected up to now, which was &quot;q&quot; =&gt; &quot;A%20B/&quot;).

So probably using [B] for moving PATH parts to QUERY param is the better choice anyway, ensuring it only gets decoded once.

With [B], /A%2520B/?a=b%2520c is finally rewritten to q=A%2520B%2f&amp;a=b%2520c ending up in php as &quot;q&quot; =&gt; &quot;A%20B/&quot;, &quot;a&quot; =&gt; &quot;b%20c&quot;. Looks better to me.

With [B] the FISH link gets encoded like so escaping backreference &#39;FISH J12345.6-78919&#39; to &#39;FISH+J12345%2e6%2d78919&#39;, so encoding the Space is done with the + (not %20). In php it gets decoded again.

I suppose, for single encoded PATH parts, not using [B] in most cases was ok, most likely because the % sign is not much used in PATH parts. Using [B] for me is now the better solution.

There is one caveat, answered already elsewhere here: As + is valid in the PATH part, /A+%2bB/ is passed to mod_rewrite as A++B/ (so the first + stays a +), finally being passed as q=A%2b%2bB%2f ending up in php as &quot;q&quot; =&gt; &quot;A++B/&quot;. This cannot be overcome, as + is handled different in PATH part than in QUERY part.

答案4

得分: 0

这对我来说有效,RewriteEngine on 之后

RewriteBase /

RewriteRule ^(.)\ (.)$ /$1+$2 [L,R=301]

英文:

This worked for me, after RewriteEngine on

RewriteBase /

RewriteRule ^(.*)\ (.*)$ /$1+$2 [L,R=301]

huangapple
  • 本文由 发表于 2023年3月9日 20:00:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75684314.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定