Next Chapter | Previous Chapter | Contents | Index
This chapter attempts to cover some of the common issues encountered
when writing 16-bit code to run under or
. It covers how to link programs to
produce or
files, how to write device drivers, and how
to interface assembly language code with 16-bit C compilers and with
Borland Pascal.
.EXE FilesAny large program written under DOS needs to be built as a
file: only
files have the necessary internal structure required to span more than one
64K segment. Windows programs, also, have to be built as
files, since Windows does not support the
format.
In general, you generate files by using
the output format to produce one or more
files, and then linking them together using
a linker. However, NASM also supports the direct generation of simple DOS
files using the
output format (by using
and to construct
the file header), and a macro package is
supplied to do this. Thanks to Yann Guidon for contributing the code for
this.
NASM may also support natively as another
output format in future releases.
obj Format To Generate .EXE FilesThis section describes the usual method of generating
files by linking
files together.
Most 16-bit programming language packages come with a suitable linker;
if you have none of these, there is a free linker called VAL, available in
archive format from
.
An LZH archiver can be found at
.
There is another `free' linker (though this one doesn't come with sources)
called FREELINK, available from
.
A third, , written by DJ Delorie, is
available at
.
A fourth linker, , written by Anthony A.J.
Williams, is available at
.
When linking several files into a
file, you should ensure that exactly one of
them has a start point defined (using the
special symbol defined by the format: see
section 7.4.6). If no module
defines a start point, the linker will not know what value to give the
entry-point field in the output file header; if more than one defines a
start point, the linker will not know which value to use.
An example of a NASM source file which can be assembled to a
file and linked on its own to a
is given here. It demonstrates the basic
principles of defining a stack, initialising the segment registers, and
declaring a start point. This file is also provided in the
subdirectory of the NASM archives, under the
name .
segment code
..start:
mov ax,data
mov ds,ax
mov ax,stack
mov ss,ax
mov sp,stacktop
This initial piece of code sets up to point
to the data segment, and initializes and
to point to the top of the provided stack.
Notice that interrupts are implicitly disabled for one instruction after a
move into , precisely for this situation, so
that there's no chance of an interrupt occurring between the loads of
and and not
having a stack to execute on.
Note also that the special symbol is
defined at the beginning of this code, which means that will be the entry
point into the resulting executable file.
mov dx,hello
mov ah,9
int 0x21
The above is the main program: load with
a pointer to the greeting message ( is
implicitly relative to the segment , which
was loaded into in the setup code, so the full
pointer is valid), and call the DOS print-string function.
mov ax,0x4c00
int 0x21
This terminates the program using another DOS system call.
segment data hello: db 'hello, world', 13, 10, '$'
The data segment contains the string we want to display.
segment stack stack
resb 64
stacktop:
The above code declares a stack segment containing 64 bytes of
uninitialized stack space, and points at
the top of it. The directive
defines a segment called , and also
of type . The latter is not
necessary to the correct running of the program, but linkers are likely to
issue warnings or errors if your program has no segment of type
.
The above file, when assembled into a
file, will link on its own to a valid file,
which when run will print `hello, world' and then exit.
bin Format To Generate .EXE FilesThe file format is simple enough that
it's possible to build a file by writing a
pure-binary program and sticking a 32-byte header on the front. This header
is simple enough that it can be generated using
and commands by
NASM itself, so that you can use the output
format to directly generate files.
Included in the NASM archives, in the
subdirectory, is a file of macros. It
defines three macros: ,
and .
To produce a file using this method, you
should start by using to load the
macro package into your source file.
You should then issue the macro call
(which takes no arguments) to generate the file header data. Then write
code as normal for the format - you can use
all three standard sections ,
and . At the
end of the file you should call the macro
(again, no arguments), which defines some symbols to mark section sizes,
and these symbols are referred to in the header code generated by
.
In this model, the code you end up writing starts at
, just like a
file - in fact, if you strip off the 32-byte header from the resulting
file, you will have a valid
program. All the segment bases are the same,
so you are limited to a 64K program, again just like a
file. Note that an
directive is issued by the
macro, so you should not explicitly
issue one of your own.
You can't directly refer to your segment base value, unfortunately,
since this would require a relocation in the header, and things would get a
lot more complicated. So you should get your segment base by copying it out
of instead.
On entry to your file,
are already set up to point to the top of a
2Kb stack. You can adjust the default stack size of 2Kb by calling the
macro. For example, to change the stack
size of your program to 64 bytes, you would call
.
A sample program which generates a file
in this way is given in the subdirectory of
the NASM archive, as .
.COM FilesWhile large DOS programs must be written as
files, small ones are often better written
as files.
files are pure binary, and therefore most easily produced using the
output format.
bin Format To Generate .COM Files files expect to be loaded at offset
into their segment (though the segment may
change). Execution then begins at , i.e.
right at the start of the program. So to write a
program, you would create a source file
looking like
org 100h
section .text
start:
; put your code here
section .data
; put data items here
section .bss
; put uninitialized data here
The format puts the
section first in the file, so you can
declare data or BSS items before beginning to write code if you want to and
the code will still end up at the front of the file where it belongs.
The BSS (uninitialized data) section does not take up space in the
file itself: instead, addresses of BSS items
are resolved to point at space beyond the end of the file, on the grounds
that this will be free memory when the program is run. Therefore you should
not rely on your BSS being initialized to all zeros when you run.
To assemble the above program, you should use a command line like
nasm myprog.asm -fbin -o myprog.com
The format would produce a file called
if no explicit output file name were
specified, so you have to override it and give the desired file name.
obj Format To Generate .COM FilesIf you are writing a program as more than
one module, you may wish to assemble several
files and link them together into a program.
You can do this, provided you have a linker capable of outputting
files directly (TLINK does this), or
alternatively a converter program such as
to transform the file output from the linker
into a file.
If you do this, you need to take care of several things:
RESB 100h . This is to ensure
that the code begins at offset 100h relative to
the beginning of the code segment, so that the linker or converter program
does not have to adjust address references within the file when generating
the .COM file. Other assemblers use an
ORG directive for this purpose, but
ORG in NASM is a format-specific directive to the
bin output format, and does not mean the same
thing as it does in MASM-compatible assemblers.
.COM
file is loaded, all the segment registers contain the same value.
.SYS FilesMS-DOS device drivers - files - are pure
binary files, similar to files, except that
they start at origin zero rather than .
Therefore, if you are writing a device driver using the
format, you do not need the
directive, since the default origin for
is zero. Similarly, if you are using
, you do not need the
at the start of your code segment.
files start with a header structure,
containing pointers to the various routines inside the driver which do the
work. This structure should be defined at the start of the code segment,
even though it is not actually code.
For more information on the format of
files, and the data which has to go in the header structure, a list of
books is given in the Frequently Asked Questions list for the newsgroup
.
This section covers the basics of writing assembly routines that call,
or are called from, C programs. To do this, you would typically write an
assembly module as a file, and link it with
your C modules to produce a mixed-language program.
C compilers have the convention that the names of all global symbols
(functions or data) they define are formed by prefixing an underscore to
the name as it appears in the C program. So, for example, the function a C
programmer thinks of as appears to an
assembly language programmer as . This
means that in your assembly programs, you can define symbols without a
leading underscore, and not have to worry about name clashes with C
symbols.
If you find the underscores inconvenient, you can define macros to
replace the and
directives as follows:
%macro cglobal 1 global _%1 %define %1 _%1 %endmacro %macro cextern 1 extern _%1 %define %1 _%1 %endmacro
(These forms of the macros only take one argument at a time; a
construct could solve this.)
If you then declare an external like this:
cextern printf
then the macro will expand it as
extern _printf %define printf _printf
Thereafter, you can reference as if it
was a symbol, and the preprocessor will put the leading underscore on where
necessary.
The macro works similarly. You must
use before defining the symbol in
question, but you would have had to do that anyway if you used
.
Also see section 2.1.27.
NASM contains no mechanism to support the various C memory models directly; you have to keep track yourself of which one you are writing for. This means you have to keep track of the following things:
CS register
never changes its value, and always gives the segment part of the full
function address), and that functions are called using ordinary near
CALL instructions and return using
RETN (which, in NASM, is synonymous with
RET anyway). This means both that you should
write your own routines to return with RETN , and
that you should call external C routines with near
CALL instructions.
CALL FAR (or
CALL seg:offset ) and return using
RETF . Again, you should therefore write your own
routines to return with RETF and use
CALL FAR to call external routines.
DS register doesn't change its value, and always
gives the segment part of the full data item address).
DS in your routines without restoring it
afterwards, but ES is free for you to use to
access the contents of 32-bit data pointers you are passed.