1 files changed, 538 insertions, 0 deletions
diff --git a/share/doc/papers/px/pxin1.n b/share/doc/papers/px/pxin1.n
new file mode 100644
index 0000000..9a2c256
--- /dev/null
+++ b/share/doc/papers/px/pxin1.n
@@ -0,0 +1,538 @@
+.\" Copyright (c) 1979 The Regents of the University of California.
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\" 3. All advertising materials mentioning features or use of this software
+.\"    must display the following acknowledgement:
+.\"	This product includes software developed by the University of
+.\"	California, Berkeley and its contributors.
+.\" 4. Neither the name of the University nor the names of its contributors
+.\"    may be used to endorse or promote products derived from this software
+.\"    without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\"	@(#)pxin1.n	5.2 (Berkeley) 4/17/91
+.\"
+.if !\n(xx .so tmac.p
+.tr _\(ru
+.nr H1 0
+.NH 
+Organization
+.PP
+Most of
+.I px
+is written in the
+.SM "VAX 11/780"
+assembly language, using the
+.UX
+assembler
+.I as.
+Portions of
+.I px
+are also written in the
+.UX
+systems programming language C.
+.I Px
+consists of a main procedure that reads in the interpreter code,
+a main interpreter loop that transfers successively to various
+code segments implementing the abstract machine operations,
+built-in procedures and functions,
+and several routines that support the implementation of the
+Pascal input-output environment.
+.PP
+The interpreter runs at a fraction of the speed of equivalent
+compiled C code, with this fraction varying from 1/5 to 1/15.
+The interpreter occupies 18.5K bytes of instruction space, shared among
+all processes executing Pascal, and has 4.6K bytes of data space (constants,
+error messages, etc.) a copy of which is allocated to each executing process.
+.NH 2
+Format of the object file
+.PP
+.I Px
+normally interprets the code left in an object file by a run of the
+Pascal translator
+.I pi.
+The file where the translator puts the object originally, and the most
+commonly interpreted file, is called
+.I obj.
+In order that all persons using
+.I px
+share a common text image, this executable file is 
+a small process that coordinates with the interpreter to start
+execution.
+The interpreter code is placed
+at the end of a special ``header'' file and the size of the initialized
+data area of this header file is expanded to include this code,
+so that during execution it is located at an
+easily determined address in its data space.
+When executed, the object process creates a
+.I pipe ,
+creates another process by doing a
+.I fork ,
+and arranges that the resulting parent process becomes an instance of
+.I px .
+The child process then writes the interpreter code through 
+the pipe that it has to the
+interpreter process parent.
+When this process is complete, the child exits.
+.PP
+The real advantage of this approach is that it does not require modifications
+to the shell, and that the resultant objects are ``true objects'' not
+requiring special treatment.
+A simpler mechanism would be to determine the name of the file that was
+executed and pass this to the interpreter.
+However it is not possible to determine this name
+in all cases.\*(Dd
+.FS
+\*(dd\ For instance, if the
+.I pxref
+program is placed in the directory
+`/usr/bin'
+then when the user types
+``pxref program.p''
+the first argument to the program, nominally the programs name, is
+``pxref.''
+While it would be possible to search in the standard place,
+i.e. the current directory, and the system directories
+`/bin'
+and
+`/usr/bin'
+for a corresponding object file,
+this would be expensive and not guaranteed to succeed.
+Several shells exist that allow other directories to be searched
+for commands, and there is,
+in general,
+no way to determine what these directories are.
+.FE
+.NH 2
+General features of object code
+.PP
+Pascal object code is relocatable as all addressing references for
+control transfers within the code are relative.
+The code consists of instructions interspersed with inline data.
+All instructions have a length that is an even number of bytes.
+No variables are kept in the object code area.
+.PP
+The first byte of a Pascal interpreter instruction contains an operation
+code.
+This allows a total of 256 major operation codes, and 232 of these are
+in use in the current
+.I px.
+The second byte of each interpreter instruction is called the
+``sub-operation code'',
+or more commonly the
+.I sub-opcode.
+It contains a small integer that may, for example, be used as a
+block-structure level for the associated operation.
+If the instruction can take a longword constant,
+this constant is often packed into the sub-opcode
+if it fits into 8 bits and is not zero.
+A sub-opcode value of zero specifies that the constant would not
+fit and therefore follows in the next word.
+This is a space optimization, the value of zero for flagging
+the longer case being convenient because it is easy to test.
+.PP
+Other instruction formats are used.
+The branching
+instructions take an offset in the following word,
+operators that load constants onto the stack 
+take arbitrarily long inline constant values,
+and many operations deal exclusively with data on the
+interpreter stack, requiring no inline data.
+.NH 2
+Stack structure of the interpreter
+.PP
+The interpreter emulates a stack-structured Pascal machine.
+The ``load'' instructions put values onto the stack, where all
+arithmetic operations take place.
+The ``store'' instructions take values off the stack
+and place them in an address that is also contained on the stack.
+The only way to move data or to compute in the machine is with the stack.
+.PP
+To make the interpreter operations more powerful
+and to thereby increase the interpreter speed,
+the arithmetic operations in the interpreter are ``typed''.
+That is, length conversion of arithmetic values occurs when they are
+used in an operation.
+This eliminates interpreter cycles for length conversion
+and the associated overhead.
+For example, when adding an integer that fits in one byte to one that
+requires four bytes to store, no ``conversion'' operators are required.
+The one byte integer is loaded onto the stack, followed by the four
+byte integer, and then an adding operator is used that has, implicit
+in its definition, the sizes of the arguments.
+.NH 2
+Data types in the interpreter
+.PP
+The interpreter deals with several different fundamental data types.
+In the memory of the machine, 1, 2, and 4 byte integers are supported,
+with only 2 and 4 byte integers being present on the stack.
+The interpreter always converts to 4 byte integers when there is a possibility
+of overflowing the shorter formats.
+This corresponds to the Pascal language definition of overflow in
+arithmetic operations that requires that the result be correct
+if all partial values lie within the bounds of the base integer type:
+4 byte integer values.
+.PP
+Character constants are treated similarly to 1 byte integers for
+most purposes, as are Boolean values.
+All enumerated types are treated as integer values of
+an appropriate length, usually 1 byte.
+The interpreter also has real numbers, occupying 8 bytes of storage,
+and sets and strings of varying length.
+The appropriate operations are included for each data type, such as
+set union and intersection and an operation to write a string.
+.PP
+No special
+.B packed
+data formats are supported by the interpreter.
+The smallest unit of storage occupied by any variable is one byte.
+The built-ins
+.I pack
+and
+.I unpack
+thus degenerate to simple memory to memory transfers with
+no special processing.
+.NH 2
+Runtime environment
+.PP
+The interpreter runtime environment uses a stack data area and a heap
+data area, that are kept at opposite ends of memory
+and grow towards each other.
+All global variables and variables local to procedures and functions
+are kept in the stack area.
+Dynamically allocated variables and buffers for input/output are
+allocated in the heap.
+.PP
+The addressing of block structured variables is done by using
+a fixed display
+that contains the address of its stack frame
+for each statically active block.\*(Dg
+.FS
+\*(dg\ Here ``block'' is being used to mean any
+.I procedure ,
+.I function
+or the main program.
+.FE
+This display is referenced by instructions that load and store
+variables and maintained by the operations for
+block entry and exit, and for non-local
+.B goto
+statements.
+.NH 2
+Dp, lc, loop
+.PP
+Three ``global'' variables in the interpreter, in addition to the
+``display'', are the
+.I dp,
+.I lc,
+and the
+.I loop.
+The
+.I dp
+is a pointer to the display entry for the current block;
+the
+.I lc
+is the abstract machine location counter;
+and the
+.I loop
+is a register that holds the address of the main interpreter
+loop so that returning to the loop to fetch the next instruction is
+a fast operation.
+.NH 2
+The stack frame structure
+.PP
+Each active block
+has a stack frame consisting of three parts:
+a block mark, local variables, and temporary storage for partially
+evaluated expressions.
+The stack in the interpreter grows from the high addresses in memory
+to the low addresses,
+so that those parts of the stack frame that are ``on the top''
+of the stack have the most negative offsets from the display
+entry for the block.
+The major parts of the stack frame are represented in Figure 1.1.
+.so fig1.1.n
+Note that the local variables of each block
+have negative offsets from the corresponding display entry,
+the ``first'' local variable having offset `\-2'.
+.NH 2
+The block mark
+.PP
+The block mark contains the saved information necessary
+to restore the environment when the current block exits.
+It consists of two parts.
+The first and top-most part is saved by the
+.SM CALL
+instruction in the interpreter.
+This information is not present for the main program
+as it is never ``called''.
+The second part of the block mark is created by the
+.SM BEG
+begin block operator that also allocates and clears the
+local variable storage.
+The format of these blocks is represented in Figure 1.2.
+.sp
+.so fig1.2.n
+.PP
+The data saved by the
+.SM CALL
+operator includes the line number
+.I lino
+of the point of call,
+that is printed if the program execution ends abnormally;
+the location counter
+.I lc
+giving the return address;
+and the current display entry address
+.I dp
+at the time of call.
+.PP
+The
+.SM BEG
+begin operator saves the previous display contents at the level
+of this block, so that the display can be restored on block exit.
+A pointer to the beginning line number and the
+name of this block is also saved.
+This information is stored in the interpreter object code in-line after the
+.SM BEG
+operator.
+It is used in printing a post-mortem backtrace.
+The saved file name and buffer reference are necessary because of
+the input/output structure
+(this is discussed in detail in 
+sections 3.3 and 3.4).
+The top of stack reference gives the value the stack pointer should
+have when there are no expression temporaries on the stack.
+It is used for a consistency check in the
+.SM LINO
+line number operators in the interpreter, that occurs before
+each statement executed.
+This helps to catch bugs in the interpreter, that often manifest
+themselves by leaving the stack non-empty between statements.
+.PP
+Note that there is no explicit static link here.
+Thus to set up the display correctly after a non-local
+.B goto
+statement one must ``unwind''
+through all the block marks on the stack to rebuild the display.
+.NH 2
+Arguments and return values
+.PP
+A function returns its value into a space reserved by the calling
+block.
+Arguments to a
+.B function
+are placed on top of this return area.
+For both
+.B procedure
+and
+.B function
+calls, arguments are placed at the end of the expression evaluation area
+of the caller.
+When a
+.B function
+completes, expression evaluation can continue
+after popping the arguments to the
+.B function
+off the stack,
+exactly as if the function value had been ``loaded''.
+The arguments to a
+.B procedure
+are also popped off the stack by the caller
+after its execution ends.
+.KS
+.PP
+As a simple example consider the following stack structure
+for a call to a function
+.I f,
+of the form ``f(a)''.
+.so fig1.3.n
+.KE
+.PP
+If we suppose that
+.I f
+returns a
+.I real
+and that
+.I a
+is an integer,
+the calling sequence for this function would be:
+.DS
+.TS
+lp-2w(8) l.
+PUSH	\-8
+RV4:\fIl	a\fR
+CALL:\fIl	f\fR
+POP	4
+.TE
+.DE
+.ZP
+Here we use the operator
+.SM PUSH
+to clear space for the return value,
+load
+.I a
+on the stack with a ``right value'' operator,
+call the function,
+pop off the argument
+.I a ,
+and can then complete evaluation of the containing expression.
+The operations used here will be explained in section 2.
+.PP
+If the function
+.I f
+were given by
+.LS
+ 10 \*bfunction\fR f(i: integer): real;
+ 11 \*bbegin\fR
+ 12     f := i
+ 13 \*bend\fR;
+.LE
+then
+.I f
+would have code sequence:
+.DS
+.TS
+lp-2w(8) l.
+BEG:2	0
+	11
+	"f"
+LV:\fIl\fR	40
+RV4:\fIl\fR	32
+AS48
+END
+.TE
+.DE
+.ZP
+Here the
+.SM BEG
+operator takes 9 bytes of inline data.
+The first byte specifies the 
+length of the function name.
+The second longword specifies the
+amount of local variable storage, here none.
+The succeeding two lines give the line number of the
+.B begin
+and the name of the block
+for error traceback.
+The
+.SM BEG
+operator places a name pointer in the block mark.
+The body of the
+.B function
+first takes an address of the
+.B function
+result variable
+.I f
+using the address of operator
+.SM LV
+.I a .
+The next operation in the interpretation of this function is the loading
+of the value of
+.I i .
+.I I
+is at the level of the
+.B function
+.I f ,
+here symbolically
+.I l,
+and the first variable in the local variable area.
+The
+.B function
+completes by assigning the 4 byte integer on the stack to the 8 byte
+return location, hence the
+.SM AS48
+assignment operator, and then uses the
+.SM END
+operator to exit the current block.
+.NH 2
+The main interpreter loop
+.PP
+The main interpreter loop is simply:
+.DS
+.mD
+iloop:
+	\fBcaseb\fR	(lc)+,$0,$255
+	<table of opcode interpreter addresses>
+.DE
+.ZP
+The main opcode is extracted from the first byte of the instruction
+and used to index into the table of opcode interpreter addresses.
+Control is then transferred to the specified location.
+The sub-opcode may be used to index the display,
+as a small constant,
+or to specify one of several relational operators.
+In the cases where a constant is needed, but it
+is not small enough to fit in the byte sub-operator,
+a zero is placed there and the constant follows in the next word.
+Zero is easily tested for,
+as the instruction that fetches the
+sub-opcode sets the condition code flags.
+A construction like:
+.DS
+.mD
+_OPER:
+	\fBcvtbl\fR	(lc)+,r0
+	\fBbneq\fR	L1
+	\fBcvtwl\fR	(lc)+,r0
+L1:	...
+.DE
+is all that is needed to effect this packing of data.
+This technique saves space in the Pascal
+.I obj
+object code.
+.PP
+The address of the instruction at
+.I iloop
+is always contained in the register variable 
+.I loop .
+Thus a return to the main interpreter is simply:
+.DS
+	\fBjmp\fR	(loop)
+.DE
+that is both quick and occupies little space.
+.NH 2
+Errors
+.PP
+Errors during interpretation fall into three classes:
+.DS
+1) Interpreter detected errors.
+2) Hardware detected errors.
+3) External events.
+.DE
+.PP
+Interpreter detected errors include I/O errors and
+built-in function errors.  
+These errors cause a subroutine call to an error routine
+with a single parameter indicating the cause of the error.
+Hardware errors such as range errors and overflows are
+fielded by a special routine that determines the opcode
+that caused the error.
+It then calls the error routine with an appropriate error
+parameter.
+External events include interrupts and system limits such
+as available memory.
+They generate a call to the error routine with an 
+appropriate error code.
+The error routine processes the error condition, 
+printing an appropriate error message and usually
+a backtrace from the point of the error.