Assembly Language StepbyStep Programming with DOS and Linux 2nd Ed [Electronic resources] نسخه متنی

Machine Instructions and Their Operands

As we said earlier, MOV copies data from a source to a destination. MOV is an extremely versatile instruction, and understanding its versatility demands a little study of this notion of source and a destination.

Source and Destination Operands

Most machine instructions, MOV included, have one or more operands. (Some instructions have no operands.) In the machine instruction MOV AX,1, there are two operands. The first is AX, and the second is the digit 1.

By convention in assembly language, the first operand belonging to a machine instruction is the destination operand. The second operand is the source operand.

With the MOV instruction, the sense of the two operands is pretty literal: The source operand is copied to the destination operand. In MOV AX,1, the source operand 1 is copied into the destination operand AX. The sense of source and destination is not nearly so literal in other instructions, but a rule of thumb is this: Whenever a machine instruction causes a new value to be generated, that new value is placed in the destination operand.

There are three different flavors of data that may be used as operands. These are memory data, register data, and immediate data. I've laid some example MOV instructions out on the dissection pad in Table 7.1 to give you a flavor for how the different types of data are specified as operands to the MOV instruction.

Table 7.1: MOV and Its Operands
MACHINE INSTRUCTION	DESTINATION OPERAND	SOURCE OPERAND
MOV AX,	1	Source is immediate data.
MOV BX,	CX	Both are 16-bit register data.
MOV DL,	BH	Both are 8-bit register data.
MOV [BP],	DI	Destination is memory data at SS:BP.
MOV DX,	[SI]	Source is memory data at DS:SI.
MOV BX,	[ES:BX]	Source is memory data at ES:BX.

Immediate data is by far the easiest to understand. We look at it first.

Immediate Data

The MOV AX,1 machine instruction that I had you enter into DEBUG was a good example of what we call immediate data accessed through an addressing mode called immediate addressing. Immediate addressing gets its name from the fact that the item being addressed is immediate data built right into the machine instruction. The CPU does not have to go anywhere to find immediate data. It's not in a register, nor is it stored in a data segment somewhere out in memory. Immediate data is always right inside the instruction being fetched and executed.

Immediate data must be of an appropriate size for the operand. In other words, you can't move a 16-bit immediate value into an 8-bit register half such as AH or DL. Neither DEBUG nor the stand-alone assemblers will allow you to assemble an instruction like this:


MOV CL,67EF

CL is an 8-bit register, and 67EFH is a 16-bit quantity. Won't go!

Because it's built right into a machine instruction, you might think immediate data would be quick to access. This is true only to a point: Fetching anything from memory takes more time than fetching anything from a register, and instructions are, after all, stored in memory. So, while addressing immediate data is somewhat quicker than addressing ordinary data stored in memory, neither is anywhere near as quick as simply pulling a value from a CPU register.

Also keep in mind that only the source operand may be immediate data. The destination operand is the place where data goes, not where it comes from. Since immediate data consists of literal constants (numbers such as 1, 0, or 7F2BH), trying to copy something into immediate data rather than from immediate data simply has no meaning and is always an error.

Register Data

Data stored inside a CPU register is known as register data, and accessing register data directly is an addressing mode called register addressing. Register addressing is done by simply naming the register we want to work with. Here are some entirely legal examples of register data and register addressing:


MOV AX,BX
MOV BP,SP
MOV BL,CH
MOV ES,DX
ADD DI,AX
AND DX,SI

The last two examples point up the fact that we're not speaking only of the MOV instruction here. Register addressing happens any time data in a register is acted on directly, irrespective of what machine instruction is doing the acting.

The assembler keeps track of certain things that don't make sense, and one such situation is having a 16-bit register and an 8-bit register half within the same instruction. Such operations are not legal-after all, what would it mean to move a 2-byte source into a 1-byte destination? And while moving a 1-byte source into a 2-byte destination might seem more reasonable, the CPU does not support it and it cannot be done.

Playing with register addressing is easy using DEBUG. Bring up DEBUG and assemble the following series of instructions:


MOV AX,67FE
MOV BX,AX
MOV CL,BH
MOV CH,BL

Now, reset the value of IP to 0100 using the R command. Then execute the four machine instructions by issuing the T command four times in a row. The session under DEBUG would look like this:


- A
333F:0100 MOV AX,67FE
333F:0103 MOV BX,AX
333F:0105 MOV CL,BH
333F:0107 MOV CH,BL
333F:0109
- R IP
IP 0100
:0100
- R
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=333F ES=333F SS=333F CS=333F IP=0100  NV UP EI PL NZ NA PO NC
333F:0100 B8FE67    MOV   AX,67FE
- T
AX=67FE BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=333F ES=333F SS=333F CS=333F IP=0103  NV UP EI PL NZ NA PO NC
333F:0103 89C3     MOV   BX,AX
- T
AX=67FE BX=67FE CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=333F ES=333F SS=333F CS=333F IP=0105  NV UP EI PL NZ NA PO NC
333F:0105 88F9     MOV   CL,BH
- T
AX=67FE BX=67FE CX=0067 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=333F ES=333F SS=333F CS=333F IP=0107  NV UP EI PL NZ NA PO NC
333F:0107 88DD     MOV   CH,BL
- T
AX=67FE BX=67FE CX=FE67 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=333F ES=333F SS=333F CS=333F IP=0109  NV UP EI PL NZ NA PO NC
333F:0109 1401     ADC   AL,01

Keep in mind that the T command executes the instruction displayed in the third line of the most recent R command display. The ADC instruction in the last register display is yet another garbage instruction, and although executing this particular instruction would not cause any harm (it's just an ADC: Add with Carry), I recommend against executing random instructions just to see what happens. Executing certain jump or interrupt instructions could wipe out sectors on your hard disk or, worse, cause internal damage to DOS that would not show up until later on.

Let's recap what these four instructions accomplished. The first instruction is an example of immediate addressing: The hexadecimal value 067FEH was moved into the AX register. The second instruction used register addressing to move register data from AX into BX. Keep in mind that the way the operands are written is slightly contrary to the common-sense view of things. The destination operand comes first. Moving some thing from AX to BX is done by executing MOV BX,AX. Assembly language is just like that sometimes-if that were the most peculiar thing about it, I for one would be mighty grateful ...

The third instruction and fourth instruction both move data between register halves rather than full, 16-bit registers. These two instructions accomplish something interesting. Look at the last register display, and compare the value of BX and CX. By moving the value from BX into CX a byte at a time, it was possible to reverse the order of the two bytes making up BX. The high half of BX (what we sometimes call the most significant byte, or MSB, of BX) was moved into the low half of CX. Then the low half of BX (what we sometimes call the least significant byte, or LSB, of BX) was moved into the high half of CX. This is just a sample of the sorts of tricks you can play with the general-purpose registers.

Just to disabuse you of the notion that the MOV instruction should be used to exchange the two halves of a 16-bit register, let me suggest that you do the following: Before you exit DEBUG from your previous session, assemble this instruction and execute it using the T command:


XCHG CL,CH

The XCHG instruction exchanges the values contained in its two operands. What was interchanged before is interchanged again, and the value in CX will match the values already in AX and BX. A good idea while writing your first assembly language programs is to double-check the instruction set periodically to see that what you have cobbled together with four or five instructions is not possible using a single instruction. The x86 instruction set is very good at fooling you in that regard! (One caution: Later on, you might find that cobbling something together from simple instructions might run more quickly than the same thing accomplished by a single specialized instruction, especially on the newest Pentium-class CPUs. Pentium optimization is a truly peculiar business-but we're way ahead of ourselves now in speaking of what's fast and what's not. Learn how it works first-and then we can explore how fast it is!)

Memory Data

Immediate data is built right into its own machine instruction. Register data is stored in one of the CPU's limited collection of internal regisChapter 8. To pin down a single byte anywhere within real mode's megabyte of memory, you need both the segment and offset components. We generally write them together, specified with a colon to separate them, as either literal constants or register names: 0B00:0167, DS:SI or CS:IP.

BX's Hidden Agenda

One of the easiest mistakes to make early on is to assume that you can use any of the general-purpose registers to specify an offset for memory data. Not so! If you try to specify an offset in AX, CX, or DX, the assembler will flag an error.

In real mode, only BP, BX, SI, and DI may hold an offset for memory data.

(This isn't true for more advanced CPUs working in protected mode, as we'll see toward the end of this book.) So, in fact, general-purpose registers AX, CX, and DX aren't quite so general after all. Why was general-purpose register BX singled out for special treatment? Think of it as the difference between dreams and reality for Intel. In the best of all worlds, every register could be used for all purposes. Unfortunately, when CPU designers get together and argue about what their nascent CPU is supposed to do, they are forced to face the fact that there are only so many transistors on the chip to do the job.

Each chip function is given a budget of transistors (sometimes numbering in the tens or even hundreds of thousands). If the desired logic cannot be implemented using that number of transistors, the expectations of the designers have to be brought down a notch and some CPU features shaved from the specification.

The early x86 CPUs including the 8086 and 8088 are full of such compromises. There were not enough transistors available at design time to allow all general-purpose registers to do everything, so in addition to the truly general-purpose ability to hold data, each 8086/8088 register has what I call a "hidden agenda." Each register has some ability that none of the others share. I describe each register's hidden agenda at some appropriate time in this book, and I call it out as such.

In the 20-odd years since the 8086 was created, Intel has hugely expanded the power of its x86 family of CPUs. And sure enough, when you get into 32-bit protected mode, most of the limitations imposed by early transistor budgets go away, and general-purpose registers become almost completely general. However, when acting in real mode (as we're speaking of here), the Pentium, 486, and 386 CPUs take on just about all the characteristics of the 8086 and 8088, including this sort of limitation, which is built into the logic that decodes the instruction set for real mode.

Should you, then, be learning this sort of bad-old-days limitation? I think so. What it teaches you is that limitations exist and need to be remembered. Even the mighty Pentium II has limitations and restrictions. You need to develop a grasp of them, or you'll be floundering around wondering why things don't work.

Using Memory Data

With one or two important exceptions (the string instructions, which I cover to a degree-but not exhaustively-later on), only one of an instruction's two operands may specify a memory location. In other words, you can move an immediate value to memory, or a memory value to a register, or some other similar combination, but you can't move a memory value directly to another memory value. This is just an inherent limitation of the CPU, and we have to live with it, inconvenient as it gets at times.

Specifying a memory address as one of an instruction's operands is a little complicated. The offset address must be resident in one of the general-purpose registers that can legally hold an offset address. (Remember, that's only BP, BX, SI, and DI-not any of the others such as AX, CX, or DX.) To specify that we want the data at the memory location contained in the register rather than the data in the register itself, we use square brackets around the name of the register. In other words, to move the word at address DS:BX into register AX, we would use the following instruction:


MOV AX,[BX]

Similarly, to move a value residing in register DX into the word at address DS:DI, you would use this instruction:


MOV [DI],DX

Segment Register Assumptions

The only problem with these examples is this: "DS" isn't anywhere in either instruction. Where does it say to use DS as the segment register?

It doesn't. To keep addressing notation simple, the x86 CPUs in real mode make certain assumptions about certain instructions in combinations with certain registers. There is no comprehensible system to these assumptions, and like dates in history or Spanish irregular verbs, you'll just have to memorize them, or at least know where to look them up. (The where is in Appendix B in this book.)

One of these assumptions is that in working with memory data, the MOV instruction uses the segment address stored in segment register DS unless you explicitly tell it otherwise. In the case of the two preceding examples, we did not tell the MOV instruction to use some segment register other than DS, so it fell back on its assumptions and used DS. However, had you specified the offset as residing in register SP instead of BX or DI, the MOV instruction would have assumed the use of segment register SS instead. This assumption involves a memory mechanism known as the stack, which we won't really address until the next chapter.

Overriding Segment Assumptions for Memory Data

But what if you want to use ES as a segment register for memory data addressed in the MOV instruction? It's not difficult. The instruction set includes what are called segment override prefixes. These are not precisely instructions, but are more like the filters that may be snapped in front of a camera lens. The filter is not itself a lens, but it alters the way the lens operates.

There is one segment override prefix for each of the four segment registers: CS, DS, SS, and ES. In assembly language they are written as the name of the segment register followed by a colon, as shown in Table 7.2.

Table 7.2: Segment Override Prefixes
SEGMENT OVERRIDE PREFIX	FUNCTION
CS:	Forces use of code segment register CS
DS:	Forces use of the data segment register DS
SS:	Forces use of the stack segment register SS
ES:	Forces use of the extra segment register ES

In use, the segment override prefix is placed immediately in front of the memory data reference whose segment register assumption is to be overridden. For example, to force a MOV instruction to copy a value from the AX register into a location at some offset (contained in SI) into the code segment, you would use this instruction:


MOV [CS:SI],AX

Without the CS: override prefix, this instruction would move the value of AX into the data segment, at an address specified as DS:SI.

Prefixes in use are very reminiscent of how an address is written; in fact, understanding how prefixes work will help you keep in mind that in every reference to memory data within an instruction, there is a ghostly segment register assumption floating in the air. You may not see the ghostly DS: assumption in your MOV instruction, but if you forget that it's there, the whole concept of memory data will begin to seem arbitrary and magical.

Every reference to memory data includes either an assumed segment register or else a segment override prefix to specify a segment register other than the assumed segment register.

At the machine-code level, a segment override prefix is a single binary byte. The prefix byte is placed in front of rather than within a machine instruction. In other words, if the binary bytes comprising a MOV AX,[BX] instruction are 8BH 07H, adding the ES segment override prefix to the instruction (MOV AX,[ES:BX]) places a single 26H in front of the opcode bytes, giving us 26H 8BH 07H as the full binary equivalent.

If you're sharp, the question will already have occurred to you: What about the flat models? Recall that in both real mode flat model and protected mode flat model, the segment registers all point to the same place and are not changed during the run of the program. In the flat models you do not use segment overrides. What I have explained previously about segment overrides applies only to the real mode segmented model!

Real Mode Memory Data Summary

Real mode memory data consists of a single byte or word in memory, addressed by way of a segment value and an offset value. The register containing the offset address is enclosed in square brackets to indicate that the contents of memory, rather than the contents of the register, are being addressed. The segment register used to address memory data is usually assumed according to a complex set of rules. Optionally, a segment override prefix may be placed in the instruction to specify some segment register other than the default segment register.

Figure 7.1 shows diagrammatically what happens during a MOV AX,[ES:BX] instruction. The segment address component of the full 20-bit memory address is contained inside the CPU in segment register ES. Ordinarily, the segment address would be in register DS, but the MOV instruction contains the ES: segment override prefix. The offset address component is specified to reside in the BX register.

Figure 7.1: How memory data is addressed.

The CPU sends out the values in ES and BX to the memory system side by side. Together, the two values pin down one memory location where MyWord begins. MyWord is actually two bytes, but that's fine-all the x86 CPUs working in real mode (except for the 8088) can bring both bytes into the CPU at once, while the 8088 brings both bytes in separately, one after the other. The CPU handles details like that and you needn't worry about it. Because AX is a 16-bit register, of course, two 8-bit bytes can fit into it quite nicely.

The segment address may reside in any of the four segment registers: CS, DS, SS, or ES. However, the offset address may reside only in registers BX, BP, SP, SI, or DI. AX, CX, and DX may not be used to contain an offset address during real mode memory addressing.

Limitations of the MOV Instruction

The MOV instruction can move nearly any register to any other register. For reasons having to do with the limited budget of transistors on the 8086 and 8088 chips, MOV can't quite do any move you can think of-in real mode, at least. Here's a list of MOV's real mode limitations:

MOV cannot move memory data to memory data. In other words, an instruction like MOV [SI],[BX] is illegal. Either of MOV's two operands may be memory data, but both cannot be at once.

MOV cannot move one segment register into another. Instructions like MOV CS,SS are illegal. This could have been handy, but it simply can't be done.

MOV cannot move immediate data into a segment register. You can't code up MOV CS,0B800H. Again, it would be handy but you just can't do it.

MOV cannot move one of the 8-bit register halves into a 16-bit register, nor vice versa. There are easy ways around any possible difficulties here, and preventing moves between operands of different sizes can keep you out of numerous kinds of trouble.

These limitations, of course, are over and above those situations that simply don't make sense: moving a register or memory into immediate data, moving immediate data into immediate data, specifying a general-purpose register as a segment register to contain a segment, or specifying a segment register to contain an offset address. Table 7.3 shows numerous illegal MOV instructions that illustrate these various limitations and nonsense situations.

Table 7.3: Rogue MOV Instructions
ILLEGAL MOV INSTRUCTION	WHY IT'S ILLEGAL
MOV 17,1	Only one operand may be immediate data.
MOV 17,BX	Only the source operand may be immediate data.
MOV CX,DH	The operands must be the same size.
MOV [DI],[SI]	Only one operand may be memory data.
MOV DI,[DX:BX]	DX is not a segment register.
MOV ES,0B800	Segment registers may not be loaded from immediate data.
MOV DS,CS	Only one operand may be a segment register.
MOV [AX],BP	AX may not address memory data (nor may CX or DX).
MOV SI,[CS]	Segment registers may not address memory data.

Some Notes on Assembler Syntax

Although we haven't talked about it a whole lot just yet, this book focuses on a particular assembler called NASM. And if this book is your first exposure to assembly language, nothing I've said so far should cause you any cognitive dissonance with your earlier experience, since you have no earlier experience. However, if you've played with assembly language using other assemblers, you will soon begin to see small differences between what you once learned in writing assembly language mnemonics and what I'm teaching in this book. These differences are matters of syntax, and they may become important, especially if you ever try to convert source code to NASM from another assembler such as MASM, TASM, or A86.

In the best of all worlds, every assembler would respond in precisely the same way to all the same mnemonics and directives set up all the same ways. In reality, syntax differs. Here's a common example: In Microsoft's MASM, memory data that includes a segment override must be coded like this:


MOV AX,ES:[BX]

Note here that the segment override "ES:" is outside the brackets enclosing BX. NASM places the overrides inside the brackets:


MOV AX,[ES:BX]

These two lines perform precisely the same job. The people who wrote NASM feel (and I concur) that it makes far more sense to place the override inside the brackets than outside. The difference is purely one of syntax. The two instructions mean precisely the same thing, right down to generating the very same binary machine code: 3E 8B 07.

Worse, when you enter the same thing in DEBUG, it must be done this way:


ES: MOV AX,[BX]

Differences in syntax will drive you crazy on occasion, especially when flipping between NASM and DEBUG. It's best to get a firm grip on what the instructions are doing, and understand what's required to make a particular instruction assemble correctly. I point out some common differences between NASM and MASM throughout this book, since MASM is by far the most popular assembler in the x86 world, and more people have been exposed to it than any other.