1 files changed, 923 insertions, 0 deletions
diff --git a/share/doc/papers/px/pxin2.n b/share/doc/papers/px/pxin2.n
new file mode 100644
index 0000000..0a12b90
--- /dev/null
+++ b/share/doc/papers/px/pxin2.n
@@ -0,0 +1,923 @@
+.\" Copyright (c) 1979 The Regents of the University of California.
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\" 3. All advertising materials mentioning features or use of this software
+.\"    must display the following acknowledgement:
+.\"	This product includes software developed by the University of
+.\"	California, Berkeley and its contributors.
+.\" 4. Neither the name of the University nor the names of its contributors
+.\"    may be used to endorse or promote products derived from this software
+.\"    without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\"	@(#)pxin2.n	5.2 (Berkeley) 4/17/91
+.\"
+.if !\n(xx .so tmac.p
+.nr H1 1
+.if n .ND
+.NH
+Operations
+.NH 2
+Naming conventions and operation summary
+.PP
+Table 2.1 outlines the opcode typing convention.
+The expression ``a above b'' means that `a' is on top
+of the stack with `b' below it.
+Table 2.3 describes each of the opcodes.
+The character `*' at the end of a name specifies that
+all operations with the root prefix 
+before the `*'
+are summarized by one entry.
+Table 2.2 gives the codes used 
+to describe the type inline data expected by each instruction.
+.sp 2
+.so table2.1.n
+.sp 2
+.so table2.2.n
+.bp
+.so table2.3.n
+.bp
+.NH 2
+Basic control operations
+.LP
+.SH
+HALT
+.IP
+Corresponds to the Pascal procedure
+.I halt ;
+causes execution to end with a post-mortem backtrace as if a run-time
+error had occurred.
+.SH
+BEG s,W,w,"
+.IP
+Causes the second part of the block mark to be created, and
+.I W
+bytes of local variable space to be allocated and cleared to zero.
+Stack overflow is detected here.
+.I w
+is the first line of the body of this section for error traceback,
+and the inline string (length s) the character representation of its name.
+.SH
+NODUMP s,W,w,"
+.IP
+Equivalent to
+.SM BEG ,
+and used to begin the main program when the ``p''
+option is disabled so that the post-mortem backtrace will be inhibited.
+.SH
+END
+.IP
+Complementary to the operators
+.SM CALL
+and
+.SM BEG ,
+exits the current block, calling the procedure
+.I pclose
+to flush buffers for and release any local files.
+Restores the environment of the caller from the block mark.
+If this is the end for the main program, all files are
+.I flushed,
+and the interpreter is exited.
+.SH
+CALL l,A
+.IP
+Saves the current line number, return address, and active display entry pointer
+.I dp
+in the first part of the block mark, then transfers to the entry point
+given by the relative address
+.I A ,
+that is the beginning of a
+.B procedure
+or
+.B function
+at level
+.I l.
+.SH
+PUSH s
+.IP
+Clears
+.I s
+bytes on the stack.
+Used to make space for the return value of a
+.B function
+just before calling it.
+.SH
+POP s
+.IP
+Pop
+.I s
+bytes off the stack.
+Used after a
+.B function
+or
+.B procedure
+returns to remove the arguments from the stack.
+.SH
+TRA a
+.IP
+Transfer control to relative address
+.I a
+as a local
+.B goto
+or part of a structured statement.
+.SH
+TRA4 A
+.IP
+Transfer control to an absolute address as part of a non-local
+.B goto
+or to branch over procedure bodies.
+.SH
+LINO s
+.IP
+Set current line number to
+.I s.
+For consistency, check that the expression stack is empty
+as it should be (as this is the start of a statement.)
+This consistency check will fail only if there is a bug in the
+interpreter or the interpreter code has somehow been damaged.
+Increment the statement count and if it exceeds the statement limit,
+generate a fault.
+.SH
+GOTO l,A
+.IP
+Transfer control to address
+.I A
+that is in the block at level
+.I l
+of the display.
+This is a non-local
+.B goto.
+Causes each block to be exited as if with
+.SM END ,
+flushing and freeing files with
+.I pclose,
+until the current display entry is at level
+.I l.
+.SH
+SDUP*
+.IP
+Duplicate the word or long on the top of
+the stack.
+This is used mostly for constructing sets.
+See section 2.11.
+.NH 2
+If and relational operators
+.SH
+IF a
+.IP
+The interpreter conditional transfers all take place using this operator
+that examines the Boolean value on the top of the stack.
+If the value is
+.I true ,
+the next code is executed,
+otherwise control transfers to the specified address.
+.SH
+REL* r
+.IP
+These take two arguments on the stack,
+and the sub-operation code specifies the relational operation to
+be done, coded as follows with `a' above `b' on the stack:
+.DS
+.mD
+.TS
+lb lb
+c a.
+Code	Operation
+_
+0	a = b
+2	a <> b
+4	a < b
+6	a > b
+8	a <= b
+10	a >= b
+.TE
+.DE
+.IP
+Each operation does a test to set the condition code
+appropriately and then does an indexed branch based on the
+sub-operation code to a test of the condition here specified,
+pushing a Boolean value on the stack.
+.IP
+Consider the statement fragment:
+.DS
+.mD
+\*bif\fR a = b \*bthen\fR
+.DE
+.IP
+If
+.I a
+and
+.I b
+are integers this generates the following code:
+.DS
+.TS
+lp-2w(8) l.
+RV4:\fIl	a\fR
+RV4:\fIl	b\fR
+REL4	\&=
+IF	\fIElse part offset\fR
+.sp
+.T&
+c s.
+\fI\&... Then part code ...\fR
+.TE
+.DE
+.NH 2
+Boolean operators
+.PP
+The Boolean operators
+.SM AND ,
+.SM OR ,
+and
+.SM NOT
+manipulate values on the top of the stack.
+All Boolean values are kept in single bytes in memory,
+or in single words on the stack.
+Zero represents a Boolean \fIfalse\fP, and one a Boolean \fItrue\fP.
+.NH 2
+Right value, constant, and assignment operators
+.SH
+LRV* l,A
+.br
+RV* l,a
+.IP
+The right value operators load values on the stack.
+They take a block number as a sub-opcode and load the appropriate
+number of bytes from that block at the offset specified
+in the following word onto the stack. As an example, consider
+.SM LRV4 :
+.DS
+.mD
+_LRV4:
+	\fBcvtbl\fR	(lc)+,r0	#r0 has display index
+	\fBaddl3\fR	_display(r0),(lc)+,r1	#r1 has variable address
+	\fBpushl\fR	(r1)	#put value on the stack
+	\fBjmp\fR	(loop)
+.DE
+.IP
+Here the interpreter places the display level in r0.
+It then adds the appropriate display value to the inline offset and
+pushes the value at this location onto the stack.
+Control then returns to the main
+interpreter loop.
+The
+.SM RV* 
+operators have short inline data that
+reduces the space required to address the first 32K of
+stack space in each stack frame.
+The operators
+.SM RV14
+and
+.SM RV24
+provide explicit conversion to long as the data
+is pushed.
+This saves the generation of
+.SM STOI
+to align arguments to 
+.SM C
+subroutines.
+.SH
+CON* r
+.IP
+The constant operators load a value onto the stack from inline code.
+Small integer values are condensed and loaded by the
+.SM CON1
+operator, that is given by
+.DS
+.mD
+_CON1:
+	\fBcvtbw\fR	(lc)+,\-(sp)
+	\fBjmp\fR	(loop)
+.DE
+.IP
+Here note that little work was required as the required constant
+was available at (lc)+.
+For longer constants,
+.I lc
+must be incremented before moving the constant.
+The operator
+.SM CON
+takes a length specification in the sub-opcode and can be used to load
+strings and other variable length data onto the stack.
+The operators 
+.SM CON14
+and
+.SM CON24
+provide explicit conversion to long as the constant is pushed.
+.SH
+AS*
+.IP
+The assignment operators are similar to arithmetic and relational operators
+in that they take two operands, both in the stack,
+but the lengths given for them specify
+first the length of the value on the stack and then the length
+of the target in memory.
+The target address in memory is under the value to be stored.
+Thus the statement
+.DS
+i := 1
+.DE
+.IP
+where
+.I i
+is a full-length, 4 byte, integer,
+will generate the code sequence
+.DS
+.TS
+lp-2w(8) l.
+LV:\fIl	i\fP
+CON1:1
+AS24
+.TE
+.DE
+.IP
+Here
+.SM LV
+will load the address of
+.I i,
+that is really given as a block number in the sub-opcode and an
+offset in the following word,
+onto the stack, occupying a single word.
+.SM CON1 ,
+that is a single word instruction,
+then loads the constant 1,
+that is in its sub-opcode,
+onto the stack.
+Since there are not one byte constants on the stack,
+this becomes a 2 byte, single word integer.
+The interpreter then assigns a length 2 integer to a length 4 integer using
+.SM AS24 \&.
+The code sequence for
+.SM AS24
+is given by:
+.DS
+.mD
+_AS24:
+	\fBincl\fR	lc
+	\fBcvtwl\fR	(sp)+,*(sp)+
+	\fBjmp\fR	(loop)
+.DE
+.IP
+Thus the interpreter gets the single word off the stack,
+extends it to be a 4 byte integer
+gets the target address off the stack,
+and finally stores the value in the target.
+This is a typical use of the constant and assignment operators.
+.NH 2
+Addressing operations
+.SH
+LLV l,W
+.br
+LV l,w
+.IP
+The most common operation done by the interpreter
+is the ``left value'' or ``address of'' operation.
+It is given by:
+.DS
+.mD
+_LLV:
+	\fBcvtbl\fR	(lc)+,r0	#r0 has display index
+	\fBaddl3\fR	_display(r0),(lc)+,\-(sp)	#push address onto the stack
+	\fBjmp\fR	(loop)
+.DE
+.IP
+It calculates an address in the block specified in the sub-opcode
+by adding the associated display entry to the
+offset that appears in the following word.
+The
+.SM LV
+operator has a short inline data that reduces the space
+required to address the first 32K of stack space in each call frame.
+.SH
+OFF s
+.IP
+The offset operator is used in field names.
+Thus to get the address of
+.LS
+p^.f1
+.LE
+.IP
+.I pi
+would generate the sequence
+.DS
+.mD
+.TS
+lp-2w(8) l.
+RV:\fIl	p\fP
+OFF	\fIf1\fP
+.TE
+.DE
+.IP
+where the
+.SM RV
+loads the value of
+.I p,
+given its block in the sub-opcode and offset in the following word,
+and the interpreter then adds the offset of the field
+.I f1
+in its record to get the correct address.
+.SM OFF
+takes its argument in the sub-opcode if it is small enough.
+.SH
+NIL
+.IP
+The example above is incomplete, lacking a check for a
+.B nil
+pointer.
+The code generated would be
+.DS
+.TS
+lp-2w(8) l.
+RV:\fIl	p\fP
+NIL
+OFF	\fIf1\fP
+.TE
+.DE
+.IP
+where the
+.SM NIL
+operation checks for a
+.I nil
+pointer and generates the appropriate runtime error if it is.
+.SH
+LVCON s,"
+.IP
+A pointer to the specified length inline data is pushed
+onto the stack.
+This is primarily used for
+.I printf
+type strings used by 
+.SM WRITEF .
+(see sections 3.6 and 3.8)
+.SH
+INX* s,w,w
+.IP
+The operators
+.SM INX2
+and
+.SM INX4
+are used for subscripting.
+For example, the statement
+.DS
+a[i] := 2.0
+.DE
+.IP
+with
+.I i
+an integer and
+.I a
+an
+``array [1..1000] of real''
+would generate
+.DS
+.TS
+lp-2w(8) l.
+LV:\fIl	a\fP
+RV4:\fIl	i\fP
+INX4:8	1,999
+CON8	2.0
+AS8
+.TE
+.DE
+.IP
+Here the
+.SM LV
+operation takes the address of
+.I a
+and places it on the stack.
+The value of
+.I i
+is then placed on top of this on the stack.
+The array address is indexed by the
+length 4 index (a length 2 index would use
+.SM INX2 )
+where the individual elements have a size of 8 bytes.
+The code for 
+.SM INX4
+is:
+.DS
+.mD
+_INX4:
+	\fBcvtbl\fR	(lc)+,r0
+	\fBbneq\fR	L1
+	\fBcvtwl\fR	(lc)+,r0	#r0 has size of records
+L1:
+	\fBcvtwl\fR	(lc)+,r1	#r1 has lower bound
+	\fBmovzwl\fR	(lc)+,r2	#r2 has upper-lower bound
+	\fBsubl3\fR	r1,(sp)+,r3	#r3 has base subscript
+	\fBcmpl\fR	r3,r2	#check for out of bounds
+	\fBbgtru\fR	esubscr
+	\fBmull2\fR	r0,r3	#calculate byte offset
+	\fBaddl2\fR	r3,(sp)		#calculate actual address
+	\fBjmp\fR	(loop)
+esubscr:
+	\fBmovw\fR	$ESUBSCR,_perrno
+	\fBjbr\fR	error
+.DE
+.IP
+Here the lower bound is subtracted, and range checked against the
+upper minus lower bound.
+The offset is then scaled to a byte offset into the array
+and added to the base address on the stack.
+Multi-dimension subscripts are translated as a sequence of single subscriptings.
+.SH
+IND*
+.IP
+For indirect references through
+.B var
+parameters and pointers,
+the interpreter has a set of indirection operators that convert a pointer
+on the stack into a value on the stack from that address.
+different
+.SM IND
+operators are necessary because of the possibility of different
+length operands.
+The
+.SM IND14
+and
+.SM IND24
+operators do conversions to long
+as they push their data.
+.NH 2
+Arithmetic operators
+.PP
+The interpreter has many arithmetic operators.
+All operators produce results long enough to prevent overflow
+unless the bounds of the base type are exceeded.
+The basic operators available are
+.DS
+Addition:	ADD*, SUCC*
+Subtraction:	SUB*, PRED*
+Multiplication:	MUL*, SQR*
+Division:	DIV*, DVD*, MOD*
+Unary:		NEG*, ABS*
+.DE
+.NH 2
+Range checking
+.PP
+The interpreter has several range checking operators.
+The important distinction among these operators is between values whose
+legal range begins at zero and those that do not begin at zero, 
+for example
+a subrange variable whose values range from 45 to 70.
+For those that begin at zero, a simpler ``logical'' comparison against
+the upper bound suffices.
+For others, both the low and upper bounds must be checked independently,
+requiring two comparisons.
+On the 
+.SM "VAX 11/780"
+both checks are done using a single index instruction
+so the only gain is in reducing the inline data.
+.NH 2
+Case operators
+.PP
+The interpreter includes three operators for
+.B case
+statements that are used depending on the width of the 
+.B case
+label type.
+For each width, the structure of the case data is the same, and
+is represented in figure 2.4.
+.sp 1
+.so fig2.4.n
+.PP
+The
+.SM CASEOP
+case statement operators do a sequential search through the
+case label values.
+If they find the label value, they take the corresponding entry
+from the transfer table and cause the interpreter to branch to the
+specified statement.
+If the specified label is not found, an error results.
+.PP
+The
+.SM CASE
+operators take the number of cases as a sub-opcode
+if possible.
+Three different operators are needed to handle single byte,
+word, and long case transfer table values.
+For example, the
+.SM CASEOP1
+operator has the following code sequence:
+.DS
+.mD
+_CASEOP1:
+	\fBcvtbl\fR	(lc)+,r0
+	\fBbneq\fR	L1
+	\fBcvtwl\fR	(lc)+,r0	#r0 has length of case table
+L1:
+	\fBmovaw\fR	(lc)[r0],r2	#r2 has pointer to case labels
+	\fBmovzwl\fR	(sp)+,r3	#r3 has the element to find
+	\fBlocc\fR	r3,r0,(r2)	#r0 has index of located element
+	\fBbeql\fR	caserr	#element not found
+	\fBmnegl\fR	r0,r0	#calculate new lc
+	\fBcvtwl\fR	(r2)[r0],r1	#r1 has lc offset
+	\fBaddl2\fR	r1,lc
+	\fBjmp\fR	(loop)
+caserr:
+	\fBmovw\fR	$ECASE,_perrno
+	\fBjbr\fR	error
+.DE
+.PP
+Here the interpreter first computes the address of the beginning
+of the case label value area by adding twice the number of case label
+values to the address of the transfer table, since the transfer
+table entries are 2 byte address offsets.
+It then searches through the label values, and generates an ECASE
+error if the label is not found.
+If the label is found, the index of the corresponding entry
+in the transfer table is extracted and that offset is added
+to the interpreter location counter.
+.NH 2
+Operations supporting pxp
+.PP
+The following operations are defined to do execution profiling.
+.SH
+PXPBUF w
+.IP
+Causes the interpreter to allocate a count buffer
+with
+.I w
+four byte counters
+and to clear them to zero.
+The count buffer is placed within an image of the
+.I pmon.out
+file as described in the
+.I "PXP Implementation Notes."
+The contents of this buffer are written to the file
+.I pmon.out
+when the program ends.
+.SH
+COUNT w
+.IP
+Increments the counter specified by
+.I w .
+.SH
+TRACNT w,A
+.IP
+Used at the entry point to procedures and functions,
+combining a transfer to the entry point of the block with
+an incrementing of its entry count.
+.NH 2
+Set operations
+.PP
+The set operations:
+union
+.SM ADDT,
+intersection
+.SM MULT,
+element removal
+.SM SUBT,
+and the set relationals
+.SM RELT
+are straightforward.
+The following operations are more interesting.
+.SH
+CARD s
+.IP
+Takes the cardinality of a set of size
+.I s
+bytes on top of the stack, leaving a 2 byte integer count.
+.SM CARD
+uses the 
+.B ffs
+opcode to successively count the number of set bits in the set.
+.SH
+CTTOT s,w,w
+.IP
+Constructs a set.
+This operation requires a non-trivial amount of work,
+checking bounds and setting individual bits or ranges of bits.
+This operation sequence is slow,
+and motivates the presence of the operator
+.SM INCT
+below.
+The arguments to
+.SM CTTOT
+include the number of elements
+.I s
+in the constructed set,
+the lower and upper bounds of the set,
+the two
+.I w
+values,
+and a pair of values on the stack for each range in the set, single
+elements in constructed sets being duplicated with
+.SM SDUP
+to form degenerate ranges.
+.SH
+IN s,w,w
+.IP
+The operator
+.B in
+for sets.
+The value
+.I s
+specifies the size of the set,
+the two
+.I w
+values the lower and upper bounds of the set.
+The value on the stack is checked to be in the set on the stack,
+and a Boolean value of
+.I true
+or
+.I false
+replaces the operands.
+.SH
+INCT
+.IP
+The operator
+.B in
+on a constructed set without constructing it.
+The left operand of
+.B in
+is on top of the stack followed by the number of pairs in the
+constructed set,
+and then the pairs themselves, all as single word integers.
+Pairs designate runs of values and single values are represented by
+a degenerate pair with both value equal.
+This operator is generated in grammatical constructs such as
+.LS
+\fBif\fR character \fBin\fR [`+', '\-', `*', `/']
+.LE
+.IP
+or
+.LS
+\fBif\fR character \fBin\fR [`a'..`z', `$', `_']
+.LE
+.IP
+These constructs are common in Pascal, and
+.SM INCT
+makes them run much faster in the interpreter,
+as if they were written as an efficient series of
+.B if
+statements.
+.NH 2
+Miscellaneous
+.PP
+Other miscellaneous operators that are present in the interpreter
+are
+.SM ASRT
+that causes the program to end if the Boolean value on the stack is not
+.I true,
+and
+.SM STOI ,
+.SM STOD ,
+.SM ITOD ,
+and
+.SM ITOS
+that convert between different length arithmetic operands for
+use in aligning the arguments in
+.B procedure
+and
+.B function
+calls, and with some untyped built-ins, such as
+.SM SIN
+and
+.SM COS \&.
+.PP
+Finally, if the program is run with the run-time testing disabled, there
+are special operators for
+.B for
+statements
+and special indexing operators for arrays
+that have individual element size that is a power of 2.
+The code can run significantly faster using these operators.
+.NH 2
+Mathematical Functions
+.PP
+The transcendental functions 
+.SM SIN ,
+.SM COS ,
+.SM ATAN ,
+.SM EXP ,
+.SM LN ,
+.SM SQRT ,
+.SM SEED ,
+and
+.SM RANDOM
+are taken from the standard UNIX
+mathematical package.
+These functions take double precision floating point
+values and return the same.
+.PP
+The functions
+.SM EXPO ,
+.SM TRUNC ,
+and
+.SM ROUND
+take a double precision floating point number.
+.SM EXPO
+returns an integer representing the machine 
+representation of its argument's exponent,
+.SM TRUNC
+returns the integer part of its argument, and
+.SM ROUND
+returns the rounded integer part of its argument.
+.NH 2
+System functions and procedures
+.SH
+LLIMIT
+.IP
+A line limit and a file pointer are passed on the stack.
+If the limit is non-negative the line limit is set to the
+specified value, otherwise it is set to unlimited.
+The default is unlimited.
+.SH
+STLIM
+.IP
+A statement limit is passed on the stack. The statement limit 
+is set as specified.
+The default is 500,000.
+No limit is enforced when the ``p'' option is disabled.
+.SH
+CLCK
+.br
+SCLCK
+.IP
+.SM CLCK
+returns the number of milliseconds of user time used by the program;
+.SM SCLCK
+returns the number of milliseconds of system time used by the program.
+.SH
+WCLCK
+.IP
+The number of seconds since some predefined time is
+returned. Its primary usefulness is in determining
+elapsed time and in providing a unique time stamp.
+.sp
+.LP
+The other system time procedures are
+.SM DATE
+and
+.SM TIME
+that copy an appropriate text string into a pascal string array.
+The function
+.SM ARGC
+returns the number of command line arguments passed to the program.
+The procedure
+.SM ARGV
+takes an index on the stack and copies the specified
+command line argument into a pascal string array.
+.NH 2
+Pascal procedures and functions
+.SH
+PACK s,w,w,w
+.br
+UNPACK s,w,w,w
+.IP
+They function as a memory to memory move with several
+semantic checks. 
+They do no ``unpacking'' or ``packing'' in the true sense as the
+interpreter supports no packed data types.
+.SH
+NEW s
+.br
+DISPOSE s
+.IP
+An 
+.SM LV
+of a pointer is passed. 
+.SM NEW
+allocates a record of a specified size and puts a pointer
+to it into the pointer variable.
+.SM DISPOSE
+deallocates the record pointed to by the pointer
+and sets the pointer to 
+.SM NIL .
+.sp
+.LP
+The function
+.SM CHR*
+converts a suitably small integer into an ascii character.
+Its primary purpose is to do a range check.
+The function
+.SM ODD*
+returns 
+.I true
+if its argument is odd and returns
+.I false
+if its argument is even.
+The function 
+.SM UNDEF
+always returns the value
+.I false .