.ce 2 THE IBM PC SIMULATOR Andrew S. Tanenbaum .sp 2 .NH 1 INTRODUCTION .PP This program is a simulator (i.e., an interpreter) for the IBM PC. It has been tested on a VAX\(en11/750 running 4.1BSD, but since it is entirely in C, it should run on almost any 32\(enmachine with a C compiler and a reasonable UNIX system. The simulator reads in one or more object files in PC\(enIX format (which is also MINIX format, and is described below), puts them in memory, and then begins executing them. The object files may occupy any part of memory from 0 to 640K. .PP The execution occurs one instruction at a time. The simulator fetches the first instruction, carries it out, sets the condition codes, and then repeats the whole process with succeeding instructions. All of the 8088 instruction set is simulated except for the decimal instructions, because only a COBOL programmer would want those, and pandering to COBOL programmers would be beneath my dignity. Besides, the decimal instructions need the auxiliary carry bit, and that is very expensive to maintain. As you might expect of an instruction\(enby\(eninstruction interpreter, it is not blindingly fast, but it is remarkably useful at finding obscure bugs. If you want to improve the performance, profile the thing to see where the time goes, and rewrite the key parts in assembly code. The main decode loop is an obvious candidate, as is the routine that handles the condition codes. .PP The simulator also simulates some of the IBM PC's I/O devices, including the floppy disk, clock, and display. The timing of these devices is also simulated reasonably well, i.e. when you issue a seek, the interrupt comes after an interval that is roughly what the real disk does. The simulator keeps track of time in instructions, rather than in microseconds, and for simplicity it assumes that all instructions take 5 microseconds. On the average, this is generally not too far wrong. .PP One property of the simulator that makes it very valuable is that runs are 100% reproducible. If you have a weird timing bug (race condition), and you run the sick binary on the simulator, the bug may vanish. However, if it does appear, you can run it a thousand times and the bug will appear in the same place on every run. In this way, you can make simulation runs with various printing options on and off. In the most extreme case, you can make the simulation run and simply print out the address, opcode, and registers of every instruction executed. Of course, you will need lots of disk space, but many other printing and debugging options are also available. .NH 1 USING THE SIMULATOR .PP To compile the simulator, just type 'make'. The makefile will put the finished simulator on a file called '88'. .PP To use the simulator, you need the executable program (which may consist of multiple files loaded into simulated core one after another before the run), and the simulated floppy disks. Normally, the floppy disk image of the operating system, as produced by build, is put on a file called 'image'. The root file system is expected on a file called 'rootfs'. This can be changed by altering the initialization of the variable 'root_name' in the file aux.c. The mountable floppy disk file systems are expected on the files 'disk.0' and 'disk.1'. .PP When the simulator starts, it acts like 'rootfs' is in drive 0. The simulator knows that the root file system diskette is read in from low block numbers to high block numbers. As soon as it sees a low block number again, it automatically switches file descriptors so that subsequent reads and writes on drive 0 use 'disk.0' and subsequent reads and writes on drive 1 use 'disk.1'. For many simulation situations, the whole test can be done by making an appropriate /etc/rc file on the root file system. Do not use getlf in /etc/rc, since terminal input is not simulated. .PP In summary, you normally should set up 4 files before starting a simulation run: .sp 1. image \(en the bootable floppy disk image as produced by build. 2. rootfs \(en the root file system, as produced by the MINIX mkfs. 3. disk.0 \(en the /usr diskette, as produced by the MINIX mkfs. 4. disk.1 \(en the /user diskette, as produced by the MINIX mkfs. .NH 1 SIMULATOR FLAGS AND OPTIONS .PP The next step in running the simulator is to make up a shell script that calls it. Of course, it can be called directly from the terminal, but usually there are a number of parameters to control debugging options, so a shell script is more convenient. The parameters that are allowed are described below. The parameters indicated by xxxx and yyyy are hexadecimal numbers using the characters [0\(en9a\(enf] or the characters [0\(en9A\(enF]. Leading zeros are permitted. The parameters indicated by dddd are decimal integers. The parameters indicated by p and q are single decimal digits. The numerical values are set tight against the flag, as in \(ent0A94 rather than as \(ent 0A94. The file names used in the \(enf flag are separated from the flag by a space. .nf .ta 1i \(enaxxxx\(enyyyy only dump when xxxx <= pc <= yyyy \(enbxxxx\(enyyyy breakpoint trap at xxxx. Skip first yyyy executions (optional) \(encxxxx count number of times instruction at xxxx is executed \(endpq dumping enabled. p tells when to dump, q tells what: p: 1=dump CALL, RET, traps, and IRET only p: 2=dump CALL, RET, traps, IRET and jumps p: 3=dump every instruction q: 1=print pc only q: 2=print pc, instruction and traced word, if any q: 3=print pc, instruction, ax, bx, cx, dx q: 4=same as 3 + si, di, bp, sp and condition codes q: 5=same as 3 + top 4 words of the stack \(enexxxx\(enyyyy stop execution when the contents of xxxx equals yyyy Multiple \(ene flags are allowed. The tests are or'ed. \(enfxxxx A input file A begins at CLICK (16\(enbyte unit) xxxx \(enhdddd make a histogram of program counter every dddd instructions \(eni count instructions executed \(enkxxxx set all 4 segment registers to xxxx initially \(enmdddd print the value of the pc every dddd (decimal) instructions \(enndddd start dumping after dddd (decimal) instructions. \(enpxxxx\(enyyyy program text located between xxxx and yyyy; trap if pc ever gets outside the allowed range. Multiple \(enp flags are ok. \(enrxxxx run program starting at xxxx \(ensxxxx\(enyyyy check to see if stack in the range xxxx to yyyy; if sp ever goes outside range, trap. Multiple \(ens flags are allowed. \(entxxxx print word at xxxx in dumps \(enuxxxx\(enyyyy stop execution when the contents of xxxx unequal to yyyy Multiple \(enu flags are allowed. The tests are or'ed. \(enwdddd have the clock interrupt occur at dddd Hz (default 60). \(enyxxxx print a message every time the instruction at xxxx is executed \(enzdddd stop execution after dddd (decimal) instructions .fi Here are some examples: Example 1: 88 \(enk0060 \(enr0600 \(enf0060 image All four segment registers (CS, DS, SS and ES) are initially set to 0x60 clicks (corresponding to machine address 0x600 or 1536). Execution begins at address 0x0600 (1536). The image file is loaded at click 0x60, again address 1536. In short, a single file, called image is loaded at 1536, the segment registers are set to correspond to 1536, and execution begins at 1536. This is the standard command to run a MINIX binary with no debugging or tracing. The addition of any flags, even just the \(eni flag slows down simulation substantially, because when any debugging or monitoring flags are present a procedure is called to check each flag to see if it is on or off. If no flags are present, the procedure is not called. Example 2: 88 \(enk0060 \(enr0600 \(enf0060 image \(eni \(enm10000 \(enn800000 \(end34 \(enz810000 The simulator is run as in the previous example, but now it also counts the number of instructions executed and prints the total at the end of the run. It also prints the program counter every 10000 instructions. After 800000 instructions, it begins dumping. After each instruction, the program counter, mnemonic, first 6 bytes of the instruction, registers, and condition codes are printed on a single line. The mnemonic 'mixd' means that it is not possible to determine the instruction from the first byte of the opcode, and it was decided that a detailed analysis of the instruction to figure out what it did was not worth the additional cost in performance. Simulation stops after 810000 instructions have been executed. At this point a message is printed, and the registers and top part of the stack is printed. A core dump is created on the file 'core.88'. This core dump can be read using the program 'r', described below. Example 3: 88 \(enk0060 \(enr0600 \(enf0060 image \(eni \(enb2800 \(ene4802\(en0000 \(ene7040\(en5000 Run the simulator until one of three events occurs: 1. The program counter gets to 0x2800. 2. The 16\(enbit word at address 0x4802 goes to 0. 3. The 16\(enbit word at address 0x7040 goes to 0x5000. The first event to occur stops the simulation with a register print out, stack print out, and core file. A message is also printed telling which event was detected. Example 4: 88 \(enk0060 \(enr0600 \(enf0060 image \(eni \(enu4040\(en0000 \(eny3050 \(eny5900 \(enz2000000 Run the simulator until the 16\(enbit word at 4040 becomes nonzero, but in any event not more 2000000 instructions. Every time the program counter takes on either the values 0x3050 or 0x5900, print a message telling how many instructions have been executed so far. The \(eny flag does not stop the simulation, as the \(enb flag does. Example 5: 88 \(enk0060 \(enr0600 \(enf0060 image \(eni \(ent7080 \(enp0600\(en2000 \(enm1000 Run the simulator as long as the program counter stays in the range 0x0600 to 0x2000. Every 1000 instructions print the opcode and the contents of the word at address 0x7080. Note that the clock interrupt is fully simulated, so that unless the vector is properly initialized, at the first clock interrupt the program counter will be set to 0, and the simulator will stop. .PP In addition to the above debugging aids, the simulator also checks for a few things that it considers unreasonable, even though they are legal. For example, if the CS register is ever set to 0, simulation stops with an error message. This is nearly always a bug, typically a wild interrupt. Another example is the stack pointer. The 8088 does not mind if it is odd, but the simulator does. Again, this is nearly always a bug, and should be caught as early as possible. The code can easily be modified to remove these checks if you (foolishly) want to disable them. .PP This directory contains a file called 'r.c' which is used to read the core dumps produced by the simulator. You can start 'r' by just typing its name. It has no arguments, but it reads lines from the terminal. Each line should either contain a hex number, in which case a few words around the requested address are printed, or be empty, in which case the next 8 words are printed. .NH 1 OBJECT FILE FORMAT .PP An executable file consists of three parts: a header, the program text and the initialized data. The uninitialized data (the so\(encalled bss segment) is not present in the executable file. .PP Two memory models are supported by the operating system. The small model has up to 64K memory total, for text, data, and stack. The separate I and D space model has 64K for the text and an additional 64K for data plus stack. The simulator can handle both models (as well as other models not supported by MINIX). .PP There is no space between the header and text or between the text and data, except that for a separate I and D program, the text size must be a multiple of 16 bytes, the last 0 to 15 of which may be padding. The normal header is 32 bytes and is the same as that of \s-2PC-IX\s+2. It consists of eight longs as follows: .HS .nf 0: 0x04100301L (small model), or 0x04200301L (separate I and D) 1: 0x00000020L (32\(enbyte header), or 0x00000030L (48\(enbyte header) 2: size of text segment in bytes 3: size of initialized data in bytes 4: size of bss in bytes 5: 0x00000000L 6: total memory allocated to program 7: 0x00000000L .HS .fi An alternative 48\(enbyte header is also acceptable, and consists of the standard header followed by 16 bytes that are ignored. (In PC\(enIX these words are used for symbol table information.) The longs are stored with the low\(enorder byte first, so the first byte of the file is 0x01 and the next one is 0x03. .NH 1 A DETAILED EXAMPLE .PP In this directory you will find files called image, rootfs, and disk.0. These are examples that you can use by typing 'run' to start the simulator. The files log.ram and log.disk.0 are the output files produced by mkfs when rootfs and disk.0 were made. When you run the simulator with these files, MINIX is booted and reads in the RAM disk. Then it executes /etc/rc, as it always does. The /etc/rc file is as follows: /etc/mount /dev/fd0 /usr /usr/bin/echo "This message is brought to you by /etc/rc" ls -l /bin ls -l /usr After that is finished, the \(enz flag kills the simulation run. You should get an output file that is identical to run.output.