Using Type Specifiers
Back on the sample reference appendix page (see page 212), notice the following example uses of the NEG instruction:
NEG BYTE [BX] ; Negates byte quantity at DS:BX
NEG WORD [DI] ; Negates word quantity at DS:BX
Why BYTE [BX]? Or WORD [DI]? Used in this way, BYTE and WORD are what we call type specifiers, and you literally can't use NEG (or numerous other machine instructions) on memory data without one or the other. They are not instructions in the same sense that NEG is an instruction. They exist in the broad class of things we call directives. Directives give instructions to the assembler. In this case, they tell the assembler how large the operand is when there is no other way for the assembler to know.The problem is this: The NEG instruction negates its operand. The operand can be either a byte or a word; in real mode, NEG works equally well on both. But ... how does NEG know whether to negate a byte or a word? The memory data operand [BX] only specifies an address in memory, using DS as the assumed segment register. The address DS:BX points to a byte-but it also points to a word, which is nothing more than two bytes in a row somewhere in memory. So, does NEG negate the byte located at address DS:BX? Or does it negate the two bytes (a word) that start at address DS:BX?Unless you tell it somehow, NEG has no way to know.Telling an instruction the size of its operand is what BYTE and WORD do. Several other instructions that work on single operands only (such as INC, DEC, and NOT) have the same problem and use type specifiers to resolve this ambiguity.
Types in Assembly Language
Unlike nearly all high-level languages such as Pascal and C++, the notion of type in assembly language is almost wholly a question of size. A word is a type, as is a byte, a double word, a quad word, and so on. The assembler is unconcerned with what an assembly language variable means. (Keeping track of such things is totally up to you.) The assembler only worries about how big it is. The assembler does not want to have to try to fit 10 pounds of kitty litter in a 5-pound bag, which is impossible, nor 5 pounds of kitty litter in a 10-pound bag, which can be confusing and under some circumstances possibly dangerous.Register data always has a fixed and obvious type, since a register's size cannot be changed. BL is one byte and BX is two bytes.The type of immediate data depends on the magnitude of the immediate value. If the immediate value is too large to fit in a single byte, that immediate value becomes word data and you can't load it into an 8-bit register half. An immediate value that can fit in a single byte may be loaded into either a byte-sized register half or a full word-sized register; its type is thus taken from the context of the instruction in which it exists and matches that of the register data operand into which it is to be loaded. But if you try to load immediate data into a destination that's too small for it, the assembler will give you an error. Here's a trivial example:
MOV BL,0FFFFH
When it encounters this, NASM will complain by saying, "Warning: Byte value exceeds bounds." BL can hold values from 0 to 0FFH. (0 to 255). The value 0FFFFH is out of bounds because it is much larger than 0FFH.Memory data is something else again. We've spoken of memory data so far in terms of registers holding offsets without considering the use of named memory data. I discuss named memory data in the next chapter, but in brief terms, you can define named variables in your assembly language programs using such directives as DB and DW. It looks like this:
Counter DB 0
MixTag DW 32
Here, Counter is a variable allocated as a single byte in memory by the DB (Define Byte) directive. Similarly, MixTag is a variable allocated as a word in memory by the DW (Define Word) directive.By using DB, you give variable Counter a type and hence a size. You must match this type when you use the variable name Counter in an instruction to indicate memory data. The way to do this is to use the BYTE directive, as I mentioned a little earlier. This, for example, will be accepted by the assembler:
MOV BL,BYTE [Counter]
This instruction will take the current value located in memory at the address represented by the variable name Counter and will load that variable into register half BL. You might wonder: Why do I need to put the BYTE directive there? The assembler should know that Counter is 1 byte in size because it was defined using the directive DB.In some assemblers, including Microsoft's MASM, it would. However, NASM's authors feel that it's important to be as explicit with assemblers as possible and leave little or nothing for the assembler to infer from context. So, although NASM uses the DB directive to allocate one byte of memory for the variable Counter, it does not remember that Counter takes up only one byte when you insert Counter as an operand in a machine instruction. You must build that specification into your source code, by using the BYTE directive. This will force you to think a little bit more about what you're doing at every point that you do it; that is, right where you use variable names as instruction operands. Doing so may help you avoid certain really stupid mistakes-like the ones I used to make all the time while I was working with MASM, most of which came out of trying to let the assembler do my thinking for me.To me, this is a wonderful thing, and one of the main reasons I chose NASM as the focus of this book.Now here's another case, one that NASM will assemble without a burp:
MOV BL,BYTE MixTag
This looks innocent enough until you remember that MixTag is actually 2 bytes (one word) in size, having been defined with the DW directive. You might think this is an error, because MixTag isn't the same size as BL. True enough-but the key is that there's no ambiguity here. The assembler knows what you want, even if what you want is peculiar. The type specifier BYTE forces the assembler to look upon MixTag as being 1 byte in size. MixTag is not byte-sized, however, so what actually happens is that the least significant (lowermost) byte of MixTag will be loaded into BL, with the most significant byte left high and dry.Is this useful? It can be. Is it dangerous? You bet. It is up to you to decide whether overriding the type of memory data makes sense and is completely your responsibility to ensure that doing so doesn't sprinkle your code with bugs. But nothing is left for the assembler to decide. That's what type specifiers are for: to make it clear to the assembler in every case what it is supposed to do. Whether that in fact makes sense is up to you. Use your head-and know what you're doing. That's more important in assembly language than anywhere else in computer programming.