汇编器拇指模式

huangapple go评论50阅读模式
英文:

Assembler Thumb Mode

问题

I'm new in coding with Assembler and I have a question concerning the .thumb and .syntax unified directive. I know there is an instruction set "Thumb," which contains 16-Bit commands, the Thumb2 set contains 16 and 32 Bit commands, and so does the ARM instruction set. But I cannot understand what these 2 commands switch on/off.
Thanks.

I tried to compile my code with different variations of .thumb and .syntax unified on and off, but I can't recognize any difference.

英文:

im new in coding with Assembler and I have a question concerning the .thumb and .Syntax unified directive. I know there is a instruction set "Thumb", which contains 16-Bit commands, the Thumb2 set contains 16 and 32 Bit commands and so does the ARM instruction set. But I cannot understand what these 2 commands switch on/off.
Thanks

I tried to compile my code with different variations of .thumb and .syntax unified on and off but I can't recognize any difference.

答案1

得分: 1

Here is the translated content:

首先,汇编语言是特定于汇编器的,尤其是在这种情况下是特定于设置的。ARM的汇编语言与GNU、Clang和其他汇编器不兼容,即使它们针对完全相同的ARM核心和指令集。不能通用地讨论任何指令集的汇编语言,因为汇编语言就是这样的。.thumb.syntax unified指令表明你在谈论GNU汇编器、GCC汇编器或者Clang(LLVM),它们是三种不同的汇编语言,有一些重叠。

当ARM代表Acorn RISC机器时,它制造的是实际芯片,而不是IP。有ARM1、ARM2和ARM3,其中某个版本是2A或类似的,你可以查一下。当它成为IP公司,并且代表Advanced RISC Machines时,有ARM7的产品名和ARMv4T的架构名,于是Thumb诞生了。我在印刷的白色和蓝色封面的书中看到Thumb指令(有错别字),与等效的ARM指令。那时每个Thumb指令都有一个直接的ARM等效指令(反之亦然)。

Thumb指令都是16位的,Thumb2扩展是原本未定义的Thumb指令,经过解码,然后解码额外的16位,因此更适合将Thumb或Thumb+Thumb2视为可变长度指令集,而不是16或32。这取决于你如何看待它以及如何避免与“全尺寸”ARM指令(非Thumb)混淆。请注意,最初的blblx是两个不必连续出现的16位指令,后来在Cortex-M系列中的定义发生了变化,因此它们可以视为Thumb2,不再是以前未定义的(全部Thumb变体)指令。

因此,ARMv4T的Thumb指令,原始版本是“全部Thumb变体”的版本,某些版本的架构参考手册会使用这个术语。后来的手册会根据指令列出架构名称。

可能有一些文档上的错误,但ARM似乎有七种或更多不同的Thumb指令集,我怀疑这只是ARMv4T和ARMv5T之间的差异,例如与pop有关,可以用于改变模式,在ARMv4T中只能使用bx和blx。

统一语法从一开始就令人困惑且糟糕,但如果你刚开始学习ARM汇编语言(不是64位的),那么你可能应该忍受它,因为你会发现大多数人都在使用它,例如GCC默认输出统一语法而不是预统一语法。

Thumb文档示例:

ADD <Rd>, #<immed_8>

这是正确的Thumb语法(汇编器选择它们的语法,不必遵循其他语言或与其他汇编语言相关联的文档)。等效的ARM指令是:

ADDS <Rd>, <Rd>, #<immed_8>

这两者的机器码都在文档的Thumb部分中列出。在Thumb模式下,如果你写以下内容:

adds r0,#1

add r0,r0,#1

你会得到语法错误(这是可以理解的)。

Thumb2扩展出现了好多年,ARM还是一个次要角色,尽管随着这个和接下来几个核心的推出,他们开始主导处理器世界(你的x86计算机中非x86处理器比x86处理器更多,有许多ARM和一些8051和/或z80,x86是次要的)。所以,UAL诞生得早于Thumb2。

从我们的角度来看,至少在早期的ARMv-8m之前,基本上有三种核心,但如果不是已经改变的话,从那里开始。全部Thumb变体指令、随时获取ARM技术参考手册(ARM TRM),印刷/纸质书籍中最后一本,以及第一个ARM技术

英文:

Firstly, assembly language is specific to the assembler, and in this case settings. ARM's assembly language for its various tools is not compatible with gnu and with clang and with others even if it is for the exact same arm core and instruction set as an example. You cannot generically talk about any instruction sets assembly language as this is how assembly languages are. .thumb and .syntax unified directives imply you are talking about gnu assembler, gcc assembler or clang (llvm) (three different assembly languages, with some overlap).

When ARM stood for Acorn RISC machines, they made actual chips, not IP. And you had the arm1,arm2, and arm3 and some a version of one or 3 was 2a or something, you can look it up. When it became an IP company and it stood for Advanced RISC Machines you had the arm7 product name and armv4t architecture name and thumb was born. I the white and blue covered books in print the thumb instructions included (with typos) the equivalent ARM instruction. At that time every thumb instruction had a direct arm equivalent (obviously not vice versa).

Thumb instructions are all 16 bit, the thumb2 extensions are formerly undefined thumb instructions that are decoded and then the additional 16 bits is decoded, so more proper to think of thumb or thumb+thumb2 as a variable length instruction set instead of 16 or 32. Your choice as to how you view it and as to how you avoid confusion with the "full sized" arm instructions (non thumb). Note originally bl and blx were two separate 16 bit instructions that did not have to follow each other, later with the cortex-ms the definition changed so they are a thumb2 if you will that is not a formerly undefined (all thumb variant) instruction.

So the armv4t thumb instructions, the original are the "all thumb variant" versions and some versions of the architectural reference manuals would use that term. Latter manuals will call out the architecture names per instruction.

Likely some documentation slips but arm seems to have several different thumb instruction sets at least seven or more and I suspect it just means for example the differences between armv4t and armv5t which have to do with for example pop can be used to change modes where in armv4t only bx and blx.

The unified syntax has been confusing and bad since the beginning, but if you are just starting out on arm assembly language (not 64 bit) then you should probably suffer through it, as you will find most folks use it and gcc for example outputs unified syntax not pre-unified syntax.

The thumb documentation showed for example

ADD <Rd>, #<immed_8>

And that was the proper thumb syntax (well...assemblers choose their syntax, they do not have to follow some other language nor the document that is tied to some other assembly language).

The equivalent arm instruction is.

ADDS <Rd>, <Rd>, #<immed_8>

And the machine code for both was listed in the thumb portion of the documentation.

If you were to

adds r0,#1

or

add r0,r0,#1

in thumb mode you would get a syntax error (as one would hope).

Thumb2 extensions were a ton of years away, arm was still an also ran, although with this and the next few cores they became to dominate the processor world (your x86 box has more non-x86 processors than x86 processors in it, many arms and some number of 8051s and/or z80s, the x86 is an also-ran). So UAL was born well before thumb2.

From our perspective there are basically three at least up to the early armv-8m, but it may change from there if not already.

The all thumb variant instructions, get the rev E version of the arm arm, the thick white cover book in print/paper (last of the print books) and the first pdf version of the arm arm.

The armv6-m which came with the cortex-m0. This added a couple of dozen thumb2 extensions, formerly undefined instructions that are now two halfword instructions (32 bit if you must).

The armv7-m which started with the cortex-m3. This added an additional 100-150 new thumb2 extensions over and above the armv6-m.

For some reason the non-UAL thumb instruction set in gas (gnu assembler) still exists and works great, I have code that is decades old.

The (flawed) concept was that you could write assembly language code using a unified syntax between arm of the day and thumb of the day. Since there were tons of arm instructions that had no equivalent in thumb this made no sense, the only thing that kinda made sense is if you limited yourself to thumb instructions and then depending on the mode it would make the thumb instruction or the arm equivalent. You could not write effective unified language code since you had to know which instruction set you were writing for and write code for that, which is what we were doing before the unified syntax, and after the unified syntax, so what was the point? Two separate instruction sets, two separate assembly languages, just write code for the correct one. Which is still what you have to do today.

A side effect was you could now

add r0,r0,#1 

In non UAL syntax for gas. Since it was functionally the same you get the same instruction.

add r0,r1,r2
add r0,r0,#1
.thumb
add r0,r1,r2
add r0,#1
add r0,r0,#1

giving

   0:	e0810002 	add	r0, r1, r2
   4:	e2800001 	add	r0, r0, #1
   8:	1888      	adds	r0, r1, r2
   a:	3001      	adds	r0, #1
   c:	3001      	adds	r0, #1

Note that as documented by arm (gas folks tend to not always follow the ip/chip vendors documented assembly language but in this case...) add r0,#1 and interestingly the disassembler person showed it as adds r0,#1.

Those are examples of the non-UAL that predated UAL.

So then we add ual.

add r0,r1,r2
add r0,r0,#1
.thumb
add r0,r1,r2
add r0,#1
add r0,r0,#1
.syntax unified
add r0,r1,r2
adds r0,r1,r2

Disassembly of section .text:

00000000 <.text>:
   0:	e0810002 	add	r0, r1, r2
   4:	e2800001 	add	r0, r0, #1
   8:	1888      	adds	r0, r1, r2
   a:	3001      	adds	r0, #1
   c:	3001      	adds	r0, #1
   e:	eb01 0002 	add.w	r0, r1, r2
  12:	1888      	adds	r0, r1, r2

Now add r0,r1,r2 is a valid thumb2 instruction that is part of the many armv7-m thumb2 extensions. And that is the encoding even though it looks kinda like the arm encoding. That is not the arm documented syntax though, the arm documented syntax for that thumb2 instruction is add.w

Naturally if you are writing for a non-cortex-m before the armv6 period where this instruction was added. You are now in trouble. An instruction that will not work on your processor. I am using an apt-gotten tool and could probably figure out the default processor since I did not specify one. Which is a good idea:

.cpu cortex-m0
add r0,r1,r2
add r0,r0,#1
.thumb
add r0,r1,r2
add r0,#1
add r0,r0,#1
.syntax unified
add r0,r1,r2
adds r0,r1,r2

and we get

arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:3: Error: attempt to use an ARM instruction on a Thumb-only processor -- `add r0,r1,r2'
so.s:4: Error: attempt to use an ARM instruction on a Thumb-only processor -- `add r0,r0,#1'
so.s:10: Error: cannot honor width suffix -- `add r0,r1,r2'

There are no arm instructions for that core so

.cpu cortex-m0
.thumb
add r0,r1,r2
.syntax unified
add r0,r1,r2
adds r0,r1,r2

gives

arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:6: Error: cannot honor width suffix -- `add r0,r1,r2'

Now the same tool saw it as a thumb instruction. This is not the usual but an example of different, incompatible assembly languages, even within the same tool. Most of the differences between assembly languages for the same targets are the directives and other subtle things, labels, comments, etc... Folks that port to gnu assembler seem like they want to intentionally make gnu assembler incompatible with the vendors own tools or documentation assembly language, the msr mrs instructions were quite painful for a while there.

So let's try

.cpu cortex-m3
.thumb
add r0,r1,r2
.syntax unified
add r0,r1,r2
adds r0,r1,r2

and it is happy with that

Disassembly of section .text:

00000000 <.text>:
   0:	1888      	adds	r0, r1, r2
   2:	eb01 0002 	add.w	r0, r1, r2
   6:	1888      	adds	r0, r1, r2

But let's be more correct.

.cpu cortex-m3
.thumb
add r0,r1,r2
.syntax unified
add.w r0,r1,r2
adds.w r0,r1,r2
adds r0,r1,r2

Disassembly of section .text:

00000000 <.text>:
   0:	1888      	adds	r0, r1, r2
   2:	eb01 0002 	add.w	r0, r1, r2
   6:	eb11 0002 	adds.w	r0, r1, r2
   a:	1888      	adds	r0, r1, r2

And that is all good.

As noted in comments above .thumb tells the parser the following instructions are thumb mode instructions. .arm tells the parser the following instructions are arm mode instructions.

.cpu cortex-m3
add r0,r1,r2
.syntax unified
add.w r0,r1,r2
adds.w r0,r1,r2
adds r0,r1,r2

arm-none-eabi-as so.s -o so.o
so.s: Assembler messages:
so.s:3: Error: attempt to use an ARM instruction on a Thumb-only processor -- `add r0,r1,r2'
so.s:5: Error: attempt to use an ARM instruction on a Thumb-only processor -- `add.w r0,r1,r2'
so.s:6: Error: attempt to use an ARM instruction on a Thumb-only processor -- `adds.w r0,r1,r2'
so.s:7: Error: attempt to use an ARM instruction on a Thumb-only processor -- `adds r0,r1,r2'

The gnu assembler parser starts off in .arm mode. So you do not have to specify it it is implied.

You can go back and forth as answered in a comment above.

add r0,r1,r2
.thumb
add r0,r1,r2
.arm
add r0,r1,r2

00000000 <.text>:
   0:	e0810002 	add	r0, r1, r2
   4:	1888      	adds	r0, r1, r2
   6:	0000      	.short	0x0000
   8:	e0810002 	add	r0, r1, r2

(padding was needed to align the arm instruction, naturally this is completely broken code that cannot execute, it is just demonstrating the directives).

.syntax unified
add r0,r1,r2
.thumb
add r0,r1,r2
.arm
add r0,r1,r2

.syntax unified indicates the code that follows (in either mode) is now using the UAL assembly language vs the non UAL assembly languages.

.thumb
add r0,r1,r2
.syntax unified
add r0,r1,r2
.syntax divided
add r0,r1,r2

gives

Disassembly of section .text:

00000000 <.text>:
   0:	1888      	adds	r0, r1, r2
   2:	eb01 0002 	add.w	r0, r1, r2
   6:	1888      	adds	r0, r1, r2

gnu assembler starts off in .syntax divided as we have seen already thus far. so you start in .arm .syntax divided by default if you want to change either of those you have to use directives, and then until you use another directive to change the mode or syntax it remains that through the file.

Clearly (demonstrated above) if you want to do both .thumb and .syntax unified you can do those in either order as a pair for the rest of the file to use that language, gnu assembler thumb unified syntax.

The add instructions turned out to work first time, but there are other thumb instructions that it is quite painful to get them to not use the thumb2 version the tool will stick in the larger version instead.

In this case it works cleanly.

.cpu cortex-m3
.thumb
.syntax unified
add r0,r1,r2
adds r0,r1,r2
adds.w r0,r1,r2
adds.n r0,r1,r2



.cpu cortex-m0
.thumb
.syntax unified
add r0,r1,r2
adds r0,r1,r2
adds.w r0,r1,r2
adds.n r0,r1,r2

a little confusion in the messages

so.s: Assembler messages:
so.s:5: Error: cannot honor width suffix -- `add r0,r1,r2'
so.s:7: Error: selected processor does not support `adds.w r0,r1,r2' in Thumb-2 mode

.cpu cortex-m0
.thumb
.syntax unified
add.w r0,r1,r2
adds r0,r1,r2
adds.w r0,r1,r2
adds.n r0,r1,r2

better message now

so.s: Assembler messages:
so.s:5: Error: selected processor does not support `add.w r0,r1,r2' in Thumb-2 mode
so.s:7: Error: selected processor does not support `adds.w r0,r1,r2' in Thumb-2 mode

If you are using arm assembly language in particular (risc-v is another one) you really really need to disassemble and examine often. Even when compiling to make sure it is generating code that will run. That also implies you know what core you have and what code will and will not run.

If you are just starting with arm assembly language with gnu assembler, first off use gnu assembler (arm-whatever-as) not gcc. Learn real assembly language not inlined C assembly language which is yet another language. Then learn to translate if you can absolutely justify using inline in the first place (rare). Stick with the unified syntax, just put .syntax unified right up front get the tool in that mode from the start arm or thumb. Understand that gnu assembler is not arms assembler, even if at times an arm employee may have worked on it, it is a separate assembly language. It "tends" to follow the arm documentation as far as syntax goes and this far down the road it is much better at that than the early days. Specifically I mean for the instruction syntax, not the other parts of the language. Assume the arm documentation is unified syntax. Always get the ARM Technical Reference Manual (ARM TRM) for the core you are using (and version!) Always get the ARM Architectural Reference Manual for the core you are using (ARM ARM)(full sized armv6 does not exist have to split between the armv5t and the armv7a, and ignore a whole ton of the armv7a document). ARMs programmers reference manuals are not good. They have implications and incorrect statements that lead the non-gurus (and even arm gurus) into failure. There are extremely rare nuggets if info in there that are of any use that are not properly documented in the product and architecture documents. You may also wish to get the amba/axi documents for your core or -ish for your core, they help with exclusive access sometimes and the different buses that you find in some cores. Generally the bus docs are advanced and for folks that have access to a core (work on a chip that has an arm core in it).

A couple of other directives for gnu assembler you need to know whether you are doing multi-language on a "full sized" arm or if you are working on a thumb only cortex-m.

In gnu assembler labels end with a colon : and there are some rules for what you can use for labels. A label is an address that the tools compute for you, so you do not have to. With gnu assembler the labels default as non-function labels. If you stay in the same mode you are a bit safer, but if you are making mixed mode code, you need to use another set of directives to tell the tools that some labels are functions and some are non-function addresses (data or same mode branch destinations).

.syntax unified
.arm
here:
    bl one
    bl two
    b .
one:
    bx lr
three:
    bx lr
.thumb
.align
two:    
    bl three
    bx lr

gives (linked)

Disassembly of section .text:

00008000 <here>:
    8000:	eb000001 	bl	800c <one>
    8004:	eb000002 	bl	8014 <two>
    8008:	eafffffe 	b	8008 <here+0x8>

0000800c <one>:
    800c:	e12fff1e 	bx	lr

00008010 <three>:
    8010:	e12fff1e 	bx	lr

00008014 <two>:
    8014:	f7ff fffc 	bl	8010 <three>
    8018:	4770      	bx	lr

Which is all kinds of broken. You cannot bl from arm to thumb. And the tools gave no warnings nor errors.

.syntax unified
.arm
here:
    bl one
    bl two
    b .
one:
    bx lr
three:
    bx lr
.thumb
.align
.thumb_func
two:    
    bl three
    bx lr

Now I do not expect this in general from tools but the gnu tools (I think after some major version) do this for you:

Disassembly of section .text:

00008000 <here>:
    8000:	eb000001 	bl	800c <one>
    8004:	eb000005 	bl	8020 <__two_from_arm>
    8008:	eafffffe 	b	8008 <here+0x8>

0000800c <one>:
    800c:	e12fff1e 	bx	lr

00008010 <three>:
    8010:	e12fff1e 	bx	lr

00008014 <two>:
    8014:	f7ff fffc 	bl	8010 <three>
    8018:	4770      	bx	lr
    801a:	46c0      	nop			; (mov r8, r8)
    801c:	0000      	movs	r0, r0
	...

00008020 <__two_from_arm>:
    8020:	e59fc000 	ldr	ip, [pc]	; 8028 <__two_from_arm+0x8>
    8024:	e12fff1c 	bx	ip
    8028:	00008015 	.word	0x00008015
    802c:	00000000 	.word	0x00000000

so that fixed it in one direction but not the other. From arm to thumb.
.thumb_func says the next label is a function (yes there is a lot of extra syntax you can use around the higher level language concepts of functions or procedures, etc. at a minimum it boils down to this). So it is positional, you do not have to put it on the line immediately before, you can have other stuff in there that is not a label.

There is no .arm_func, instead

.syntax unified
.arm
.type here,%function
.type one,%function
.type three,%function
here:
    bl one
    bl two
    b .
one:
    bx lr
three:
    bx lr
.thumb
.align
.thumb_func
two:    
    bl three
    bx lr

.type ... %function is used. And since the label name is in the directive you do not have to put it in front of the label.

.type works for thumb as well, and does not even have to be within the .thumb area

.syntax unified
.arm
.type here,%function
.type one,%function
.type three,%function
.type two,%function
here:
    bl one
    bl two
    b .
one:
    bx lr
three:
    bx lr
.thumb
.align
two:    
    bl three
    bx lr

and although this code is not really usable, at least it does not crash from switching instruction sets without properly switching modes.

Disassembly of section .text:

00008000 <here>:
    8000:	eb000001 	bl	800c <one>
    8004:	eb000005 	bl	8020 <__two_from_arm>
    8008:	eafffffe 	b	8008 <here+0x8>

0000800c <one>:
    800c:	e12fff1e 	bx	lr

00008010 <three>:
    8010:	e12fff1e 	bx	lr

00008014 <two>:
    8014:	f000 f80a 	bl	802c <__three_from_thumb>
    8018:	4770      	bx	lr
    801a:	46c0      	nop			; (mov r8, r8)
    801c:	0000      	movs	r0, r0
	...

00008020 <__two_from_arm>:
    8020:	e59fc000 	ldr	ip, [pc]	; 8028 <__two_from_arm+0x8>
    8024:	e12fff1c 	bx	ip
    8028:	00008015 	.word	0x00008015

0000802c <__three_from_thumb>:
    802c:	4778      	bx	pc
    802e:	e7fd      	b.n	802c <__three_from_thumb>
    8030:	eafffff6 	b	8010 <three>
    8034:	00000000 	andeq	r0, r0, r0

Oh, so gnu linker adds these trampolines (they use another name) to switch modes for you. You have to link to see them. I would assume that older versions of the tools and or other toolchains, which will have their own syntax for these declarations, might give a warning if you bl to a label in an area that is the wrong instruction set.

At least with current versions you will see gcc will generate both .type and .thumb_func for thumb function labels.

If you are working on a cortex-m for the most part you do not need to declare the labels as functions as there are no mode switches, but the vector table needs thumb function addresses (address of the function ORRed with one. If you think plus one you get in trouble).

.cpu cortex-m0
.syntax unified
.thumb
.word 0x20000800
.word reset

.align
reset:
    b .


Disassembly of section .text:

00000000 <reset-0x8>:
   0:	20000800 	.word	0x20000800
   4:	00000008 	.word	0x00000008

00000008 <reset>:
   8:	e7fe      	b.n	8 <reset>

Now that is wrong that will not boot. The vector table requires the lsbit be set per the documentation.

This hack does not work for some reason even though the docs imply it should.

.cpu cortex-m0
.syntax unified
.thumb
.word 0x20000800
.word reset|1

.align
reset:
	b .

so.s: Assembler messages:
so.s:6: Error: invalid operands (.text and *ABS* sections) for `|'

so sometimes you see this dreadful hack

.cpu cortex-m0
.syntax unified
.thumb
.word 0x20000800
.word reset+1

.align
reset:
    b .

Disassembly of section .text:

00000000 <reset-0x8>:
   0:	20000800 	.word	0x20000800
   4:	00000009 	.word	0x00000009

00000008 <reset>:
   8:	e7fe      	b.n	8 <reset>

Just do it right

.cpu cortex-m0
.syntax unified
.thumb
.word 0x20000800
.word reset

.align
.thumb_func
reset:
    b .

Disassembly of section .text:

00000000 <reset-0x8>:
   0:	20000800 	.word	0x20000800
   4:	00000009 	.word	0x00000009

00000008 <reset>:
   8:	e7fe      	b.n	8 <reset>

(yes if you do both the tool actually saves you from yourself).

Note the specific gnu assembler or gcc binaries you are using are programs themselves that were built with some specs. You can build them to default to armv4t or arm7a or whatever. And if you do not indicate the core then it uses that default (you can build a gnu assembler that breaks the examples above from giving the same results as the one I used).


In short, as already answered in a comment above.

gnu assembler starts in arm mode with divided syntax. Which means build the following instructions using the arm instruction set, using non unified syntax (until other directives say otherwise)

.thumb indicates the code that follows this directive is to be built using the thumb instruction set (until other directives...)

.syntax unified means the code that follows is to be parsed using this tools version of the unified syntax.

.arm indicates that the code that follows this directive is to be built using the arm instruction set

.syntax divided means the code that follows is to be parsed using this tools version of the specific syntax for each mode.

.syntax unified/divided apples to both .arm and .thumb directives that follow. You may wish to just put .syntax unified at the top of every file.

thumb more than arm "instruction sets" are a tricky business as somewhat indicated above. But these directives combined with the target core/processor specified, define the supported arm and/or thumb instructions for that target. arm mode has added some new instructions over time but not like thumb full sized or cortex-m which saw a large number of additions. Need to specify the right core or a lesser core that has a subset that is completely supported by the core you are using (armv4t or armv6-m/cortex-m0 for example).

You stated not being able to see the difference.

add r0,r1,r2
.thumb
add r0,r1,r2
.syntax unified
add r0,r1,r2

Disassembly of section .text:

00000000 <.text>:
   0:	e0810002 	add	r0, r1, r2
   4:	1888      	adds	r0, r1, r2
   6:	eb01 0002 	add.w	r0, r1, r2

An arm, thumb, and thumb2 version of the same syntax, but one is not functionally the same as the other two. You can definitely see the difference though.

答案2

得分: 1

我尝试使用不同的.thumb和.syntax unified的变体来编译我的代码,但我无法识别任何差异。

不应该有任何差异;这是预期的。 ARM汇编器在助记符可以映射到不同的二进制值方面有些独特。

考虑一个实用函数的汇编器 'library'。它可以用 'unified' 编写,然后你的.thumb 调用者可以使用该代码。汇编器选项将确定 'unified' 库应该生成 Thumb 二进制输出。希望你能看到这个方法的价值。

所以 .thumb 表示你只会生成 Thumb 代码。.syntax unified 正在编写可以针对 EITHER 二进制的汇编器。无论是传统的32位还是Thumb。对于较新的 'thumb2' 设置,几乎100%的操作码对应关系。最初的 'thumb1' 只允许访问寄存器 R0-R7,并且有许多限制。在这些CPU上编写 'unified' 代码会很困难。

对于现代的Cortex-A CPU,你可以编写适用于任何模式的汇编代码。如果你需要支持不了解Thumb的ARMv4 CPU,这可能会有用。对于更新的Cortex-A,thumb2是更好的二进制编码选择。更好的代码密度和性能。这对于ARMv5(thumb1)CPU来说并非如此。对于ARMv6,它介于中间位置,Thumb通常更好。

英文:

> I tried to compile my code with different variations of .thumb and .syntax unified on and off but I can't recognize any difference.

There should be no difference; that is expected. ARM assembler is somewhat unique in that the mnemonics can map to different binary values.

Consider an assembler 'library' for utility functions. It can be written in 'unified' and then your .thumb caller can use that code. The assembler options will determine that the 'unified' library should produce thumb binary output. Hopefully you can see the value of that.

So .thumb says you will only produce the Thumb code. The .syntax unified is writing assembler that can target EITHER binary. Either legacy 32bit or the Thumb. With later 'thumb2' sets, there is almost 100% op-code correspondence. Initial 'thumb1' only allowed access to registers R0-R7 and had many limitations. It would be difficult to write 'unified' code on these CPUs.

For modern Cortex-A CPUs, you can write assembler that works in either mode. It could be useful if you need to support an ARMv4 CPU that does not understand Thumb. For newer Cortex-A, thumb2 is a better binary encoding to use. Better code density and performance. This was not the case for ARMv5 (thumb1) cpus. For ARMv6, it was somewhere in the middle with thumb being usually better.

huangapple
  • 本文由 发表于 2023年4月17日 00:08:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76028899.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定