英文:
Why `.decode("utf-16")` with ASCII encoded string sometime crash?
问题
code = b"""print((lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+'\n'+y,map(lambda y,
Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM,
Sx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro,
i=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y
>=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr(
64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy
))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24))"""
shorter_code = code.decode("u16") # 在这里崩溃
code_back = shorter_code.encode("u16")[2:]
英文:
I wanted to show how we can reduce the number of character required to code a script in Python using encoding conversion, and I took the Mandelbrot set obfuscated example from the Python programming FAQ as an example.
code = b"""print((lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+'\n'+y,map(lambda y,
Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM,
Sx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro,
i=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y
>=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr(
64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy
))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24))"""
shorter_code = code.decode("u16") # crash here
print(shorter_code)
code_back = shorter_code.encode("u16")[2:]
print(code_back)
print(code_back == code)
However, the code crashed unexpectedly during execution.
Traceback (most recent call last):
File "C:\Users\lancet\AppData\Roaming\JetBrains\PyCharm2022.3\scratches\scratch_24.py", line 9, in <module>
shorter_code = code.decode("u16")
^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x29 in position 472: truncated data
I already did this kind of tricks for challenges in CodinGame code golf mode with success. So I tried with another example from the documentation, the First 10 Fibonacci numbers
example, with success.
code = b"""print(list(map(lambda x,f=lambda x,f:(f(x-1,f)+f(x-2,f)) if x>1 else 1:
f(x,f), range(10))))"""
shorter_code = code.decode("u16")
print(shorter_code)
# 牰湩⡴楬瑳洨灡氨浡摢ⱸ㵦慬扭慤砠昬⠺⡦ⰱ⥦昫砨㈭昬⤩椠㹸‱汥敳ㄠ⡦ⱸ⥦慲杮⡥〱⤩⤩
code_back = shorter_code.encode("u16")[2:]
print(code_back)
# b'print(list(map(lambda x,f=lambda x,f:(f(x-1,f)+f(x-2,f)) if x>1 else 1:\nf(x,f), range(10))))'
print(code_back == code)
# True
Why the first string is considered truncated
?
答案1
得分: 3
需要偶数长度的字符串,将尾随的空格添加到字符串
ASCII字符被表示为具有最高有效位设置为0的8位字节。由于UTF-16
字符被表示为16位字节,所以您需要具有偶数个8位字节来在UTF-16
中解码它。如果不是偶数个字节,字符串的最后一个字节将丢失8字节的数据并被视为截断。
曼德勃罗特代码的长度为473
,斐波那契代码的长度为92
。
要修复脚本,您需要一个偶数长度的字符串,只需添加一个尾随的空格。
英文:
Even size string required, add trailing white-space to the string
An ASCII character are represented as 8-bit bytes with the most significant bit set to 0. Since UTF-16
characters are represented as 16-bit bytes, you need an even number of 8-bit bytes to decode it in UTF-16
. If you don't, the last byte of the string will miss 8-byte of data and considered truncated.
The length of the Mandelbrot code is 473
, and the length of the Fibonacci code is 92
.
To fix the script, you need a string with an even length, so just add a trailing white-space.
code = b"""print((lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+'\n'+y,map(lambda y,
Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM,
Sx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro,
i=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y
>=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr(
64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy
))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24)) """
print(len(code))
# 474
shorter_code = code.decode("u16")
print(shorter_code)
# 牰湩⡴氨浡摢畒刬Ɐ畉䤬Ɐ䵉匬ⱸ祓爺摥捵⡥慬扭慤砠礬砺✫✊礫洬灡氨浡摢ⱹ䤊㵵畉䤬㵯潉刬㵵畒刬㵯潒匬㵹祓䰬氽浡摢捹䤬㵵畉䤬㵯潉刬㵵畒刬㵯潒椬䤽ⱍ匊㵸硓匬㵹祓爺摥捵⡥慬扭慤砠礬砺礫洬灡氨浡摢ⱸ捸刽Ⱶ捹礽Ᵽ畒刽Ⱶ潒刽Ɐ椊椽匬㵸硓䘬氽浡摢捸礬ⱣⱸⱹⱫ㵦慬扭慤砠Ᵽ捹砬礬欬昬⠺㱫〽漩砨砪礫礪㸊㐽〮
牯ㄠ昫砨Ᵽ捹砬砪礭礪砫Ᵽ⸲⨰⩸⭹捹欬ㄭ昬㨩⡦捸礬ⱣⱸⱹⱫ⥦挺牨ਨ㐶䘫刨⩸刨ⵯ畒⼩硓礬Ᵽⰰⰰ⥩Ⱙ慲杮⡥硓⤩㨩⡌畉礫⠪潉䤭⥵匯⥹爬湡敧匨⤩⤩⸲ⰱ〠㜮ㄭ㈮⸱ⰲ㌠ⰰ㠠ⰰ㈠⤴
code_back = shorter_code.encode("u16")[2:]
print(code_back)
b"print((lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+'\n'+y,map(lambda y,\nIu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM,\nSx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro,\ni=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y\n>=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr(\n64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy\n))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24)) "
print(code_back == code)
# True
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论