Character encoding problem – 我想 (UTF-8 – JS 和 Windows1250 – PHP)

huangapple go评论102阅读模式
英文:

Character encoding problem - I suppose (UTF-8 - JS and Windows1250 - PHP)

问题

我已解决了迄今为止的所有问题,都是通过搜索这个论坛来解决的,但现在我遇到了一个难题.. Character encoding problem – 我想 (UTF-8 – JS 和 Windows1250 – PHP) 也许问题在于我不知道该问什么问题..

所以我的问题是:
我有一个提交表单,用户在其中输入跟踪列表编号 - 对于某些情况,该编号以"%000xxxx"字符开头。
使用JS和AJAX,我向PHP端点发起POST请求。到目前为止,一切都正常,在console.log(data)中,我得到的URL如下:

endppoint/trackingNumber=%000xxxx&foo=bar

问题在PHP中开始(这是我的猜测)
在POST详细信息中,请求中有类似以下的内容:

trackingNumber: \u000xxx
foo: bar

当我在PHP控制器中打印时,我得到:

" 0xxx"

PHP - 是一个旧版本,5.3.3

已完成:

iconv('UTF-8', 'ISO-8859-1',$data);

我想要能够通过PHP发布完整的跟踪编号(使用%000而不是" 0")并理解这个问题。

英文:

all my problems so far I resolved by serching this forum, but now I reched to a wall.. Character encoding problem – 我想 (UTF-8 – JS 和 Windows1250 – PHP) mayby the problem is I'm don't know what question to ask..

So my problem is:
I have submiting form in witch user put tracking list number - and for some cases this number starts with "%000xxxx" character.
Using JS, and AJAX I make post to PHP endpoint. So fare, everything is ok, in console.log(data) I'm geting url:

endppoint/trackingNumber=%000xxxx&foo=bar

The problem starts in php (it's my guess)
In POST details, in request I have somthing like this:

trackingNumber: \u000xxx
foo: bar

and when I'm printing in PHP controler - I get:

 " 0xxx"

PHP - is old one, 5.3.3

Done:

iconv('UTF-8', 'ISO-8859-1',$data);

I'd like to be able to post via PHP full tracking number (with %000 instead of " 0") and understand that isue.

答案1

得分: 0

您的根本问题在于 % 在 URL 编码中具有特殊意义,%00 解码为 null/零字节。因此,在将数据包含在 URL 中之前,您应该使用 urlencode() 进行编码。

$trackingNumber = "%000xxx";
$foo = "bar";

$url = 'endppoint/?trackingNumber=' . urlencode($trackingNumber) . '&foo=' . urlencode($foo);

parse_str(parse_url($url)['query'], $parsed); // 读取方式

var_dump(
    $url,
    $parsed
);

输出:

string(43) "endppoint/?trackingNumber=%25000xxx&foo=bar"
array(2) {
  ["trackingNumber"]=>
  string(7) "%000xxx"
  ["foo"]=>
  string(3) "bar"
}

另外,尽管在这种特定情况下编码似乎并不重要,但您需要注意您的编码选择。Windows cpXXXX 编码和 ISO-8859-X 编码是 不等价 的,不应互换使用。如果必要,PHP 可以转换这两种编码,例如:

iconv('UTF-8', 'cp1250', $data);
iconv('UTF-8', 'ISO-8859-2', $data); // cp1250 在 8859 中的粗略等效,仅作说明

1250 本身很少使用,所以除非您在东欧的传统系统上工作,否则可能不需要。也许是 cp1252?

最后,一般建议是,文本编码是应该始终被_知道_而不是被猜测的元数据,任何声称可以“检测”编码的东西也是在猜测。

参考链接:

英文:

Your root problem is that % is significant in URL encoding, with %00 decoding to a null/zero byte. So before you include data in a URL you should urlencode() it.

$trackingNumber = "%000xxx";
$foo = "bar";

$url = 'endppoint/?trackingNumber=' . urlencode($trackingNumber) . '&foo=' . urlencode($foo);

parse_str(parse_url($url)['query'], $parsed); // how it will be read

var_dump(
    $url,
    $parsed
);

Output:

string(43) "endppoint/?trackingNumber=%25000xxx&foo=bar"
array(2) {
  ["trackingNumber"]=>
  string(7) "%000xxx"
  ["foo"]=>
  string(3) "bar"
}

Additionally, though the encoding does not seem to be significant in this specific case, you need to be careful with your encoding choices. Windows cpXXXX encodings and ISO-8859-X encodings are not equivalent, and should not be interchanged. PHP can convert either type of encoding if necessary, eg:

iconv('UTF-8', 'cp1250', $data);
iconv('UTF-8', 'ISO-8859-2', $data); // cp1250's rough equivalent in 8859, illustrative only

Also 1250 itself is seldom used, so unless you're working on a legacy system in Eastern Europe it's probably not that. Maybe cp1252?

Lastly, just general advice, is that text encoding is metadata that should always be known, never guessed, and anything that claims to "detect" the encoding is also guessing.

See:

huangapple
  • 本文由 发表于 2023年8月10日 22:26:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876677.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定