- p-code machine
-
In computer programming, a p-code machine, or portable code machine[citation needed] is a virtual machine designed to execute p-code (the assembly language of a hypothetical CPU). This term is applied both generically to all such machines (such as the Java Virtual Machine and MATLAB precompiled code), and to specific implementations, the most famous being the p-Machine of the Pascal-P system, particularly in its UCSD Pascal incarnation.
Although the concept was first implemented circa 1966 (as O-code for BCPL), the term p-code first appeared in the early 1970s. Two early compilers generating p-code were the Pascal-P compiler in 1973, by Nori, Ammann, Jensen, Hageli, and Jacobi,[1] and the Pascal-S compiler in 1975, by Niklaus Wirth.
Programs that have been translated to p-code are interpreted by a software program that emulates the behavior of the hypothetical CPU. If there is sufficient commercial interest, a hardware implementation of the CPU specification may be built (e.g., the Pascal MicroEngine or a version of the Java processor).
Contents
Benefits and weaknesses of implementing p-code
Why p-code?
Compared to direct translation into native machine code, a two-stage approach involving translation into p-code and execution by an interpreter or just-in-time compiler offers several advantages.
- Portability
- It is much easier to write a small p-code interpreter for a new machine than it is to modify a compiler to generate native code for the same machine.
- Simple implementation
- Generating machine code is one of the more complicated parts of writing a compiler. By comparison, generating p-code is much easier. This makes it useful for getting a compiler up and running quickly.
- Compact size
- Since p-code is based on an ideal virtual machine, a p-code program is often much smaller than the same program translated to machine code.
- Debugging
- When the p-code is interpreted, the interpreter can apply additional runtime checks that are difficult to implement with native code.
In the early 1980s, at least two operating systems achieved machine independence through extensive use of p-code. The Business Operating System (BOS) was a cross-platform operating system designed to run p-code programs exclusively. The UCSD p-System, developed at The University of California, San Diego, was a self-compiling and self-hosted[clarification needed] operating system based on p-code optimized for generation by the Pascal programming language.
In the 1990s, translation into p-code became a popular strategy for implementations of languages such as Python and Java.[citation needed]
Why not p-code?
One of the significant disadvantages of p-code is execution speed, which can sometimes be remedied through the use of a JIT compiler.
UCSD p-Machine
Architecture
Like many other p-code machines, the UCSD p-Machine is a stack machine, which means that most instructions take their operands from the stack, and place results back on the stack. Thus, the "add" instruction replaces the two topmost elements of the stack with their sum. A few instructions take an immediate argument. Like Pascal, the p-code is strongly typed, supporting boolean (b), character (c), integer (i), real (r), set (s), and pointer (a) types natively.
Some simple instructions:
Insn. Stack Stack Description before after adi i1 i2 i1+i2 add two integers adr r1 r2 r1+r2 add two reals dvi i1 i2 i1/i2 integer division inn i1 s1 b1 set membership; b1 = whether i1 is a member of s1 ldci i1 i1 load integer constant mov a1 a2 move not b1 ~b1 boolean negation
Environment
Unlike other stack-based environments (such as Forth and the Java Virtual Machine) the p-System has only one stack shared by procedure stack frames (providing return address, etc.) and the arguments to local instructions. Three of the machine's registers point into the stack (which grows upwards):
- SP points to the top of the stack (the stack pointer).
- MP marks the beginning of the active stack frame (the frame pointer).
- EP points to the highest stack location used in the current procedure.
Also present is a constant area, and, below that, the heap growing down towards the stack. The NP register points to the top (lowest used address) of the heap. When EP gets greater than NP, the machine's memory is exhausted.
The fifth register, PC, points at the current instruction in the code area.
Calling conventions
Stack frames look like this:
EP -> local stack SP -> ... locals ... parameters ... return address (previous PC) previous EP dynamic link (previous MP) static link (MP of surrounding procedure) MP -> function return value
The procedure calling sequence works as follows: The call is introduced with
mst n
where n specifies the difference in nesting levels (remember that Pascal supports nested procedures). This instruction will mark the stack, i.e. reserve the first five cells of the above stack frame, and initialise previous EP, dynamic, and static link. The caller then computes and pushes any parameters for the procedure, and then issues
cup n, p
to call a user procedure (n being the number of parameters, p the procedure's address). This will save the PC in the return address cell, and set the procedure's address as the new PC.
User procedures begin with the two instructions
ent 1, i ent 2, j
The first sets SP to MP + i, the second sets EP to SP + j. So i essentially specifies the space reserved for locals (plus the number of parameters plus 5), and j gives the number of entries needed locally for the stack. Memory exhaustion is checked at this point.
Returning to the caller is accomplished via
retC
with C giving the return type (i, r, c, b, a as above, and p for no return value). The return value has to be stored in the appropriate cell previously. On all types except p, returning will leave this value on the stack.
Instead of calling a user procedure (cup), standard procedure q can be called with
csp q
These standard procedures are Pascal procedures like readln() ("csp rln"), sin() ("csp sin"), etc. Peculiarly eof() is a p-Code instruction instead.
Example machine
Niklaus Wirth specified a simple p-code machine in the 1976 book Algorithms + Data Structures = Programs. The machine had 3 registers - a program counter p, a base register b, and a top-of-stack register t. There were 8 instructions, with one (opr) having multiple forms.
This is the code for the machine, written in Pascal:
procedure interpret; const stacksize = 500; var p, b, t: integer; {program-, base-, topstack-registers} i: instruction; {instruction register} s: array [1..stacksize] of integer; {datastore} function base(l: integer): integer; var b1: integer; begin b1 := b; {find base l levels down} while l > 0 do begin b1 := s[b1]; l := l - 1 end; base := b1 end {base}; begin writeln(' start pl/0'); t := 0; b := 1; p := 0; s[1] := 0; s[2] := 0; s[3] := 0; repeat i := code[p]; p := p + 1; with i do case f of lit: begin t := t + 1; s[t] := a end; opr: case a of {operator} 0: begin {return} t := b - 1; p := s[t + 3]; b := s[t + 2]; end; 1: s[t] := -s[t]; 2: begin t := t - 1; s[t] := s[t] + s[t + 1] end; 3: begin t := t - 1; s[t] := s[t] - s[t + 1] end; 4: begin t := t - 1; s[t] := s[t] * s[t + 1] end; 5: begin t := t - 1; s[t] := s[t] div s[t + 1] end; 6: s[t] := ord(odd(s[t])); 8: begin t := t - 1; s[t] := ord(s[t] = s[t + 1]) end; 9: begin t := t - 1; s[t] := ord(s[t] <> s[t + 1]) end; 10: begin t := t - 1; s[t] := ord(s[t] < s[t + 1]) end; 11: begin t := t - 1; s[t] := ord(s[t] >= s[t + 1]) end; 12: begin t := t - 1; s[t] := ord(s[t] > s[t + 1]) end; 13: begin t := t - 1; s[t] := ord(s[t] <= s[t + 1]) end; end; lod: begin t := t + 1; s[t] := s[base(l) + a] end; sto: begin s[base(l)+a] := s[t]; writeln(s[t]); t := t - 1 end; cal: begin {generate new block mark} s[t + 1] := base(l); s[t + 2] := b; s[t + 3] := p; b := t + 1; p := a end; int: t := t + a; jmp: p := a; jpc: begin if s[t] = 0 then p := a; t := t - 1 end end {with, case} until p = 0; writeln(' end pl/0'); end {interpret};
This machine was used to run Wirth's PL/0, which was a Pascal subset compiler used to teach compiler development.
See also
- Run-time system
- Compiler
- Interpreter
- Interpreted language
- Joel McCormack
- Microsoft P-Code
- Token threaded code
- UCSD Pascal
Notes
- ^ Nori, K.V.; Ammann, U.; Jensen; Nageli, H. (1975). The Pascal P Compiler Implementation Notes. Zurich: Eidgen. Tech. Hochschule.
Further reading
- Steven Pemberton and Martin Daniels: Pascal Implementation: The P4 Compiler and Interpreter. ISBN 0-85312-358-6; ISBN 0-13-653031-1
- Steven Pemberton's page on Pascal has Pascal sources of the P4 compiler and interpreter, usage instructions, and the p-code of the compiler (generated by itself).
- The Jefferson Computer Museum's page on the UCSD p-System
- Open Source implementation, including packaging and pre-compiled binaries; a friendly fork of the Klebsch implementation
- Compiling with C# and Java, Pat Terry, 2005, ISBN 0-321-26360-X, 624
- Algorithms + Data Structures = Programs, Niklaus Wirth, 1975, ISBN 0-13-022418-9
- Compiler Construction, Niklaus Wirth, 1996, ISBN 0-201-40353-6
- The Byte Book of Pascal, Blaise W. Liffick, Editor, 1979, ISBN 0-07-037823-1
- PASCAL - The Language and its Implementation, Edited by D.W. Barron, 1981, ISBN 0-471-27835-1. Especially see the articles Pascal-P Implementation Notes and Pascal-S: A Subset and its Implementation.
Categories:- Stack-based virtual machines
- Pascal
- Compilers
- Programming language implementation
Wikimedia Foundation. 2010.