英文:
Character encoding problem - I suppose (UTF-8 - JS and Windows1250 - PHP)
问题
我已解决了迄今为止的所有问题,都是通过搜索这个论坛来解决的,但现在我遇到了一个难题.. 也许问题在于我不知道该问什么问题..
所以我的问题是:
我有一个提交表单,用户在其中输入跟踪列表编号 - 对于某些情况,该编号以"%000xxxx"字符开头。
使用JS和AJAX,我向PHP端点发起POST请求。到目前为止,一切都正常,在console.log(data)中,我得到的URL如下:
endppoint/trackingNumber=%000xxxx&foo=bar
问题在PHP中开始(这是我的猜测)
在POST详细信息中,请求中有类似以下的内容:
trackingNumber: \u000xxx
foo: bar
当我在PHP控制器中打印时,我得到:
" 0xxx"
PHP - 是一个旧版本,5.3.3
已完成:
iconv('UTF-8', 'ISO-8859-1',$data);
我想要能够通过PHP发布完整的跟踪编号(使用%000而不是" 0")并理解这个问题。
英文:
all my problems so far I resolved by serching this forum, but now I reched to a wall.. mayby the problem is I'm don't know what question to ask..
So my problem is:
I have submiting form in witch user put tracking list number - and for some cases this number starts with "%000xxxx" character.
Using JS, and AJAX I make post to PHP endpoint. So fare, everything is ok, in console.log(data) I'm geting url:
endppoint/trackingNumber=%000xxxx&foo=bar
The problem starts in php (it's my guess)
In POST details, in request I have somthing like this:
trackingNumber: \u000xxx
foo: bar
and when I'm printing in PHP controler - I get:
" 0xxx"
PHP - is old one, 5.3.3
Done:
iconv('UTF-8', 'ISO-8859-1',$data);
I'd like to be able to post via PHP full tracking number (with %000 instead of " 0") and understand that isue.
答案1
得分: 0
您的根本问题在于 %
在 URL 编码中具有特殊意义,%00
解码为 null/零字节。因此,在将数据包含在 URL 中之前,您应该使用 urlencode()
进行编码。
$trackingNumber = "%000xxx";
$foo = "bar";
$url = 'endppoint/?trackingNumber=' . urlencode($trackingNumber) . '&foo=' . urlencode($foo);
parse_str(parse_url($url)['query'], $parsed); // 读取方式
var_dump(
$url,
$parsed
);
输出:
string(43) "endppoint/?trackingNumber=%25000xxx&foo=bar"
array(2) {
["trackingNumber"]=>
string(7) "%000xxx"
["foo"]=>
string(3) "bar"
}
另外,尽管在这种特定情况下编码似乎并不重要,但您需要注意您的编码选择。Windows cpXXXX 编码和 ISO-8859-X 编码是 不等价 的,不应互换使用。如果必要,PHP 可以转换这两种编码,例如:
iconv('UTF-8', 'cp1250', $data);
iconv('UTF-8', 'ISO-8859-2', $data); // cp1250 在 8859 中的粗略等效,仅作说明
1250 本身很少使用,所以除非您在东欧的传统系统上工作,否则可能不需要。也许是 cp1252?
最后,一般建议是,文本编码是应该始终被_知道_而不是被猜测的元数据,任何声称可以“检测”编码的东西也是在猜测。
参考链接:
- https://en.wikipedia.org/wiki/Windows_code_page#Windows-125x_series
- https://en.wikipedia.org/wiki/ISO/IEC_8859#The_parts_of_ISO/IEC_8859
英文:
Your root problem is that %
is significant in URL encoding, with %00
decoding to a null/zero byte. So before you include data in a URL you should urlencode()
it.
$trackingNumber = "%000xxx";
$foo = "bar";
$url = 'endppoint/?trackingNumber=' . urlencode($trackingNumber) . '&foo=' . urlencode($foo);
parse_str(parse_url($url)['query'], $parsed); // how it will be read
var_dump(
$url,
$parsed
);
Output:
string(43) "endppoint/?trackingNumber=%25000xxx&foo=bar"
array(2) {
["trackingNumber"]=>
string(7) "%000xxx"
["foo"]=>
string(3) "bar"
}
Additionally, though the encoding does not seem to be significant in this specific case, you need to be careful with your encoding choices. Windows cpXXXX encodings and ISO-8859-X encodings are not equivalent, and should not be interchanged. PHP can convert either type of encoding if necessary, eg:
iconv('UTF-8', 'cp1250', $data);
iconv('UTF-8', 'ISO-8859-2', $data); // cp1250's rough equivalent in 8859, illustrative only
Also 1250 itself is seldom used, so unless you're working on a legacy system in Eastern Europe it's probably not that. Maybe cp1252?
Lastly, just general advice, is that text encoding is metadata that should always be known, never guessed, and anything that claims to "detect" the encoding is also guessing.
See:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论