One Program, Three Segments
The main problem with real mode flat model is that everything you do must fit into 64K of memory. This isn't much of a pinch for learning assembly language and just playing around writing small utilities, but once you try to create something ambitious-say, a word processor or database-driven e-mail client-you find that code and data begin to crowd one another in a big hurry. So, for all its trouble, real mode segmented model was the only way to make full use of real mode's megabyte of memory.Today, of course, you'd either create a Windows application (which you would probably not attempt in assembly) or you'd work in protected mode flat model under an implementation of Unix for the Intel x86 CPUs. Nonetheless, if you understand segments, you have it in yourself to understand every other aspect of assembly work.Let's do the Land Shark HyperBike trick again, this time with a version of EAT.ASM specifically written to use the real mode segmented model. Here's the bike-and then we'll take it apart just like last time:
; Source name : EATSEG.ASM
; Executable name : EATSEG.EXE
; Code model: : Real mode segmented model
; Version : 1.0
; Created date : 9/10/1999
; Last update : 9/10/1999
; Author : Jeff Duntemann
; Description : A simple example of a DOS .EXE file programmed for
; real mode segmented model, using NASM-IDE 1.1,
; NASM 0.98, and ALINK. This program demonstrates
; how segments are defined and initialized using NASM.
[BITS 16] ; Set 16 bit code generation
SEGMENT junk ; Segment containing code
..start: ; The two dots tell the linker to Start Here.
; Note that this is a special symbol and MUST
; be in lower case! "..start:" "..START:"
; SEGMENT SETUP
;
; In real mode segmented model, a program uses three segments, and it must
; set up the addresses in the three corresponding segment registers. This
; is what the ASSUME directive does in MASM; we ASSUME nothing in NASM!
; Each of the three segments has a name (here, code, data, and stack) and
; these names are identifiers indicating segment addresses. It is the
; appropriate segment address that is moved into each segment register.
; Note that you can't move an address directly into a segment register;
; you must first move the address into a general purpose register. Also
; note that we don't do anything with CS; the ..start: label tells the
; linker where the code segment begins.
mov ax,data ; Move segment address of data segment into AX
mov ds,ax ; Copy address from AX into DS
mov ax,stack ; Move segment address of stack segment into AX
mov ss,ax ; Copy address from AX into SS
mov sp,stacktop ; Point SP to the top of the stack
mov dx,eatmsg ; Mem data ref without [] loads the ADDRESS!
mov ah,9 ; Function 9 displays text to standard output.
int 21H ; INT 21H makes the call into DOS.
mov ax, 04C00H ; This DOS function exits the program
int 21H ; and returns control to DOS.
SEGMENT data ; Segment containing initialized data
eatmsg db "Eat at Joe's!", 13, 10, "$" ;Here's our message
SEGMENT stack stack ;This means a segment of *type* "stack"
; that is also *named* "stack"! Some
; linkers demand that a stack segment
; have the explicit type "stack"
resb 64 ; Reserve 64 bytes for the program stack
stacktop: ; It's significant that this label points to
; the *last* of the reserved 64 bytes, and
; not the first!
Three Segments
Assembly language programs written for real mode segmented model must contain at least three segments: One for code, one for data, and one for the stack. Larger programs may contain more than one code segment and more than one data segment, but real mode programs may contain only one stack segment at a time.EATSEG.ASM has those three necessary segments. Each segment has a name: stack,data, and code, which indicate pretty clearly what the segment is for. The code segment, pretty obviously, contains the machine instructions that do the program's work. The data segment contains initialized variables.The stack segment contains the program's stack. I haven't explained stacks just yet, and because you don't really need to understand stacks in order to understand how EATSEG.ASM works, I'm going to hold off just a little while longer. In short, a stack is simply an ordered place to stash things for the short term-and that will have to do until we cover the concept in depth in the next section.Each of the three segments is declared using the SEGMENT directive, which is a command that tells NASM that a segment begins here. The SEGMENT directive must be followed by the segment's name. You can name the segments whatever you like, but custom suggests that when you have only three segments, they be called stack,data, and code. Why obscure the meaning of what you're writing?The segment containing the stack has some special considerations attached to it, especially regarding the linking of several files together into one executable program. One of these considerations is that the stack have the type "stack" attached to it. This tells the linker (as I explain later) that this particular segment is special-it's a stack segment and not just a data segment. Hence the line:
SEGMENT stack stack
Nobody's stuttering here. The SEGMENT directive is creating a stack named "stack" that is of the type "stack." The first identifier is the name; the second is the type. You could change the name of the segment to MyStack or GreasedPig if you like, but it's important to let the type of the stack segment be precisely stack. More on this after we explain something else.
Don't ASSUME…
If you remember, in the real mode flat model, the operating system sets all four segment registers to the same value (one that it selects) when the program is loaded into memory and run. In the real mode segmented mode, the different segments are indeed different and distinct regions of memory and are not all the same place. When the program begins running, DOS doesn't set the segment registers to anything. Your program must do that on its own. (DOS does, of course, set CS to the start of the code segment before giving control to your program. The other segment registers it leaves alone.)This is what the first part of EATSEG.ASM does: It takes the addresses represented by the segment names for the data and stack segments and loads them into DS and SS, the segment registers governing those segments:
mov ax,data ; Move segment address of data segment into AX
mov ds,ax ; Copy address from AX into DS
mov ax,stack ; Move segment address of stack segment into AX
mov ss,ax ; Copy address from AX into SS
Keep in mind that you can only load a segment register from a general-purpose register-you can't load it from anything else, either immediate data or memory data. This is why the segment addresses have to pass through AX to get into DS and SS. (Because we're not using ES to govern a segment defined at assembly time right there in our program, we don't need to load ES with anything right off the bat.)This is a good place to point out a crucial difference between NASM (the assembler that we're using in this book) and Microsoft's extremely popular MASM, which is probably the most-used assembler in history: MASM attempts to associate segment names with segment types. NASM does not.With one small exception done as a courtesy to the linker, NASM does not know which segment is the code segment, nor which segment is the data segment, nor which segment is the stack segment. You define a segment by name:
SEGMENT data ; Segment containing initialized data
The name "data," however, tells you that it's the data segment. The assembler doesn't look for the string "data" and note somewhere that the segment named data is the data segment. This is why you could change the preceding line to this:
SEGMENT GreasedPig ; Segment containing initialized data
Nothing would change. GreasedPig is an odd name for a segment, but a completely legal one.In MASM, Microsoft defines the ASSUME directive, which associates segment names with segment registers. This allows MASM to generate segment prefixes automatically when it creates the opcodes called out by a particular mnemonic in your source code. This is a tricky and subtle business, so to make this clearer, imagine a memory variable defined in a segment that is addressed via ES:
SEGMENT JunkSegment
JunkChunk DW 0FFA7H
At the beginning of the program, you have to make sure ES is loaded with the segment address of JunkSegment:
MOV AX, JunkSegment ; Load segment address of JunkSegment into ES via AX
MOV ES, AX
Ordinarily, using NASM, you have to specify when a piece of memory data is located relative to the ES register, because the default is DS:
MOV AX,[ES:JunkChunk] ; Move word variable JunkChunk from JunkSegment (ES) into AX
That's the NASM way. Using Microsoft's MASM, you can associate a segment name with ES using the ASSUME directive:
ASSUME ES:JunkSegment
Having associated ES and JunkSegment this way, you could now write the MOV instruction without explicitly including the ES: segment prefix:
MOV AX,[JunkChunk] ; Move word variable JunkChunk from JunkSegment (ES) into AX
Thanks to ASSUME, MASM knows that the variable JunkChunk is located in extra segment ES, so it inserts the ES: prefix behind the scenes as it generates the opcode for this mnemonic. Many of us (NASM's authors included) don't think this is a particularly good idea. It makes the source code less specific and hence less readable-a person not familiar with the program might assume (heh-heh) that JunkChunk is in the data segment associated with DS because there's no ES: prefix and DS is the default for memory variable references like that.
So, NASM has nothing like ASSUME. When you move away from the default addressing of memory variables relative to DS, you must include the segment register prefix inside the square brackets of all memory variable references!
Naming the Stack Segment
The exception I noted earlier is that NASM allows you to say which segment is the stack segment:
SEGMENT MyStack stack
Here, MyStack is the name of the segment (which can be any legal identifier) and stack is the type. This is not for NASM's benefit-it will not take any action of its own based on knowing that the segment named MyStack is in fact the stack segment. But some linkers need to know that there is a stack segment defined in the program. Stack segments are special as segments go, at least in part (kind of like Tigger) there can be only one-but there must be one! Some linkers check to see whether there is a segment in a program designated as the stack segment, and to keep such linkers quiet NASM allows you to give the stack type to a segment defined with SEGMENT.This is a good idea and I recommend that you do it.
Choosing a Starting Point
There are no jumps, loops, or subroutines in EATSEG.ASM. If you've a smattering of assembly language smarts you may wonder if the ..start: label at the beginning of the code segment is unnecessary except for readability purposes. After all, start is not referenced anywhere within the program.On the other hand, code execution has to begin somewhere, and you need to tell the assembler (and especially the linker) where code execution must begin. This is the purpose of the ..start: label.The issue is this: DOS needs to know at what address to begin execution when it loads and runs the program. (DOS sets code segment register CS when it loads your program into memory prior to executing it.) You might think DOS could assume that execution would begin at the start of the code segment, but there may be more than one code segment, and under most circumstances the programmer does not specify the order of multiple code segments within a single program. (The linker has the power to rearrange multiple code segments for reasons that I can't explain in this book.) Better to have no doubt about it, and for that reason you the programmer should pick a starting point and tell the assembler what it is.You may notice that leaving out ..start: won't keep NASM from assembling a program, and while the linker will complain about the lack of a starting point, the linker will default to starting execution at the beginning of the code segment-which in our case is the only code segment, so there's no ambiguity there.Nonetheless, it's bad practice to leave out the starting point label.
Assembling and Linking EATSEG.ASM
Although NASM can generate a .COM file (for a real mode flat model program) directly, it can't generate a .EXE file for a real mode segmented model program in the same way. Once you move away from a single segment in real mode flat model, NASM needs the help of a linker to generate the final .EXE file.I've obtained permission to distribute an excellent free linker with this book's CD-ROM. The linker is ALINK, written by Anthony Williams. It's on the CD-ROM, and if you copied the executable file ALINK.EXE to your hard drive along with everything else, you can invoke it simply by naming it.NASM-IDE was intended for writing programs in real mode flat model, so it relies exclusively on NASM and does not have any machinery for invoking a linker. That means that NASM-IDE won't be able to do the assemble and link tasks for us. It's time to face the fiendish command line.If you're working from DOS you can simply assemble and link from the DOS command line. If you're working with NASM-IDE in a DOS box under Windows, it's probably easier to "shell out" to the DOS command line from inside NASM-IDE. This is done by selecting the menu item File|DOS Shell. You will see NASM-IDE vanish and be replaced by a blank screen with the DOS prompt. When you're done with the DOS shell, type EXIT followed by Enter to return to NASM-IDE.Assembling EATSEG.ASM is done with the following command line:
C:\>NASM16 EATSEG.ASM -f obj -o EATSEG.OBJ
This command line will assemble EATSEG.ASM to the file EATSEG.OBJ, in the standard .OBJ linkable file format. Linking is even easier:
C:\>ALINK EATSEG.OBJ
Here, ALINK will convert EATSEG.OBJ into EATSEG.EXE. I explain more about linkers and what they do in the next chapter. Here, ALINK is acting more as a file format converter than anything else, since there's only one file to be linked. Later on, we'll see how ALINK can connect multiple .OBJ files into a single executable file.After ALINK runs, you'll have the file EATSEG.EXE on your hard disk. That's the file that you can name at the DOS command line to run EATSEG.