diff options
Diffstat (limited to 'share/doc/papers/px/pxin2.n')
-rw-r--r-- | share/doc/papers/px/pxin2.n | 923 |
1 files changed, 923 insertions, 0 deletions
diff --git a/share/doc/papers/px/pxin2.n b/share/doc/papers/px/pxin2.n new file mode 100644 index 0000000..0a12b90 --- /dev/null +++ b/share/doc/papers/px/pxin2.n @@ -0,0 +1,923 @@ +.\" Copyright (c) 1979 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)pxin2.n 5.2 (Berkeley) 4/17/91 +.\" +.if !\n(xx .so tmac.p +.nr H1 1 +.if n .ND +.NH +Operations +.NH 2 +Naming conventions and operation summary +.PP +Table 2.1 outlines the opcode typing convention. +The expression ``a above b'' means that `a' is on top +of the stack with `b' below it. +Table 2.3 describes each of the opcodes. +The character `*' at the end of a name specifies that +all operations with the root prefix +before the `*' +are summarized by one entry. +Table 2.2 gives the codes used +to describe the type inline data expected by each instruction. +.sp 2 +.so table2.1.n +.sp 2 +.so table2.2.n +.bp +.so table2.3.n +.bp +.NH 2 +Basic control operations +.LP +.SH +HALT +.IP +Corresponds to the Pascal procedure +.I halt ; +causes execution to end with a post-mortem backtrace as if a run-time +error had occurred. +.SH +BEG s,W,w," +.IP +Causes the second part of the block mark to be created, and +.I W +bytes of local variable space to be allocated and cleared to zero. +Stack overflow is detected here. +.I w +is the first line of the body of this section for error traceback, +and the inline string (length s) the character representation of its name. +.SH +NODUMP s,W,w," +.IP +Equivalent to +.SM BEG , +and used to begin the main program when the ``p'' +option is disabled so that the post-mortem backtrace will be inhibited. +.SH +END +.IP +Complementary to the operators +.SM CALL +and +.SM BEG , +exits the current block, calling the procedure +.I pclose +to flush buffers for and release any local files. +Restores the environment of the caller from the block mark. +If this is the end for the main program, all files are +.I flushed, +and the interpreter is exited. +.SH +CALL l,A +.IP +Saves the current line number, return address, and active display entry pointer +.I dp +in the first part of the block mark, then transfers to the entry point +given by the relative address +.I A , +that is the beginning of a +.B procedure +or +.B function +at level +.I l. +.SH +PUSH s +.IP +Clears +.I s +bytes on the stack. +Used to make space for the return value of a +.B function +just before calling it. +.SH +POP s +.IP +Pop +.I s +bytes off the stack. +Used after a +.B function +or +.B procedure +returns to remove the arguments from the stack. +.SH +TRA a +.IP +Transfer control to relative address +.I a +as a local +.B goto +or part of a structured statement. +.SH +TRA4 A +.IP +Transfer control to an absolute address as part of a non-local +.B goto +or to branch over procedure bodies. +.SH +LINO s +.IP +Set current line number to +.I s. +For consistency, check that the expression stack is empty +as it should be (as this is the start of a statement.) +This consistency check will fail only if there is a bug in the +interpreter or the interpreter code has somehow been damaged. +Increment the statement count and if it exceeds the statement limit, +generate a fault. +.SH +GOTO l,A +.IP +Transfer control to address +.I A +that is in the block at level +.I l +of the display. +This is a non-local +.B goto. +Causes each block to be exited as if with +.SM END , +flushing and freeing files with +.I pclose, +until the current display entry is at level +.I l. +.SH +SDUP* +.IP +Duplicate the word or long on the top of +the stack. +This is used mostly for constructing sets. +See section 2.11. +.NH 2 +If and relational operators +.SH +IF a +.IP +The interpreter conditional transfers all take place using this operator +that examines the Boolean value on the top of the stack. +If the value is +.I true , +the next code is executed, +otherwise control transfers to the specified address. +.SH +REL* r +.IP +These take two arguments on the stack, +and the sub-operation code specifies the relational operation to +be done, coded as follows with `a' above `b' on the stack: +.DS +.mD +.TS +lb lb +c a. +Code Operation +_ +0 a = b +2 a <> b +4 a < b +6 a > b +8 a <= b +10 a >= b +.TE +.DE +.IP +Each operation does a test to set the condition code +appropriately and then does an indexed branch based on the +sub-operation code to a test of the condition here specified, +pushing a Boolean value on the stack. +.IP +Consider the statement fragment: +.DS +.mD +\*bif\fR a = b \*bthen\fR +.DE +.IP +If +.I a +and +.I b +are integers this generates the following code: +.DS +.TS +lp-2w(8) l. +RV4:\fIl a\fR +RV4:\fIl b\fR +REL4 \&= +IF \fIElse part offset\fR +.sp +.T& +c s. +\fI\&... Then part code ...\fR +.TE +.DE +.NH 2 +Boolean operators +.PP +The Boolean operators +.SM AND , +.SM OR , +and +.SM NOT +manipulate values on the top of the stack. +All Boolean values are kept in single bytes in memory, +or in single words on the stack. +Zero represents a Boolean \fIfalse\fP, and one a Boolean \fItrue\fP. +.NH 2 +Right value, constant, and assignment operators +.SH +LRV* l,A +.br +RV* l,a +.IP +The right value operators load values on the stack. +They take a block number as a sub-opcode and load the appropriate +number of bytes from that block at the offset specified +in the following word onto the stack. As an example, consider +.SM LRV4 : +.DS +.mD +_LRV4: + \fBcvtbl\fR (lc)+,r0 #r0 has display index + \fBaddl3\fR _display(r0),(lc)+,r1 #r1 has variable address + \fBpushl\fR (r1) #put value on the stack + \fBjmp\fR (loop) +.DE +.IP +Here the interpreter places the display level in r0. +It then adds the appropriate display value to the inline offset and +pushes the value at this location onto the stack. +Control then returns to the main +interpreter loop. +The +.SM RV* +operators have short inline data that +reduces the space required to address the first 32K of +stack space in each stack frame. +The operators +.SM RV14 +and +.SM RV24 +provide explicit conversion to long as the data +is pushed. +This saves the generation of +.SM STOI +to align arguments to +.SM C +subroutines. +.SH +CON* r +.IP +The constant operators load a value onto the stack from inline code. +Small integer values are condensed and loaded by the +.SM CON1 +operator, that is given by +.DS +.mD +_CON1: + \fBcvtbw\fR (lc)+,\-(sp) + \fBjmp\fR (loop) +.DE +.IP +Here note that little work was required as the required constant +was available at (lc)+. +For longer constants, +.I lc +must be incremented before moving the constant. +The operator +.SM CON +takes a length specification in the sub-opcode and can be used to load +strings and other variable length data onto the stack. +The operators +.SM CON14 +and +.SM CON24 +provide explicit conversion to long as the constant is pushed. +.SH +AS* +.IP +The assignment operators are similar to arithmetic and relational operators +in that they take two operands, both in the stack, +but the lengths given for them specify +first the length of the value on the stack and then the length +of the target in memory. +The target address in memory is under the value to be stored. +Thus the statement +.DS +i := 1 +.DE +.IP +where +.I i +is a full-length, 4 byte, integer, +will generate the code sequence +.DS +.TS +lp-2w(8) l. +LV:\fIl i\fP +CON1:1 +AS24 +.TE +.DE +.IP +Here +.SM LV +will load the address of +.I i, +that is really given as a block number in the sub-opcode and an +offset in the following word, +onto the stack, occupying a single word. +.SM CON1 , +that is a single word instruction, +then loads the constant 1, +that is in its sub-opcode, +onto the stack. +Since there are not one byte constants on the stack, +this becomes a 2 byte, single word integer. +The interpreter then assigns a length 2 integer to a length 4 integer using +.SM AS24 \&. +The code sequence for +.SM AS24 +is given by: +.DS +.mD +_AS24: + \fBincl\fR lc + \fBcvtwl\fR (sp)+,*(sp)+ + \fBjmp\fR (loop) +.DE +.IP +Thus the interpreter gets the single word off the stack, +extends it to be a 4 byte integer +gets the target address off the stack, +and finally stores the value in the target. +This is a typical use of the constant and assignment operators. +.NH 2 +Addressing operations +.SH +LLV l,W +.br +LV l,w +.IP +The most common operation done by the interpreter +is the ``left value'' or ``address of'' operation. +It is given by: +.DS +.mD +_LLV: + \fBcvtbl\fR (lc)+,r0 #r0 has display index + \fBaddl3\fR _display(r0),(lc)+,\-(sp) #push address onto the stack + \fBjmp\fR (loop) +.DE +.IP +It calculates an address in the block specified in the sub-opcode +by adding the associated display entry to the +offset that appears in the following word. +The +.SM LV +operator has a short inline data that reduces the space +required to address the first 32K of stack space in each call frame. +.SH +OFF s +.IP +The offset operator is used in field names. +Thus to get the address of +.LS +p^.f1 +.LE +.IP +.I pi +would generate the sequence +.DS +.mD +.TS +lp-2w(8) l. +RV:\fIl p\fP +OFF \fIf1\fP +.TE +.DE +.IP +where the +.SM RV +loads the value of +.I p, +given its block in the sub-opcode and offset in the following word, +and the interpreter then adds the offset of the field +.I f1 +in its record to get the correct address. +.SM OFF +takes its argument in the sub-opcode if it is small enough. +.SH +NIL +.IP +The example above is incomplete, lacking a check for a +.B nil +pointer. +The code generated would be +.DS +.TS +lp-2w(8) l. +RV:\fIl p\fP +NIL +OFF \fIf1\fP +.TE +.DE +.IP +where the +.SM NIL +operation checks for a +.I nil +pointer and generates the appropriate runtime error if it is. +.SH +LVCON s," +.IP +A pointer to the specified length inline data is pushed +onto the stack. +This is primarily used for +.I printf +type strings used by +.SM WRITEF . +(see sections 3.6 and 3.8) +.SH +INX* s,w,w +.IP +The operators +.SM INX2 +and +.SM INX4 +are used for subscripting. +For example, the statement +.DS +a[i] := 2.0 +.DE +.IP +with +.I i +an integer and +.I a +an +``array [1..1000] of real'' +would generate +.DS +.TS +lp-2w(8) l. +LV:\fIl a\fP +RV4:\fIl i\fP +INX4:8 1,999 +CON8 2.0 +AS8 +.TE +.DE +.IP +Here the +.SM LV +operation takes the address of +.I a +and places it on the stack. +The value of +.I i +is then placed on top of this on the stack. +The array address is indexed by the +length 4 index (a length 2 index would use +.SM INX2 ) +where the individual elements have a size of 8 bytes. +The code for +.SM INX4 +is: +.DS +.mD +_INX4: + \fBcvtbl\fR (lc)+,r0 + \fBbneq\fR L1 + \fBcvtwl\fR (lc)+,r0 #r0 has size of records +L1: + \fBcvtwl\fR (lc)+,r1 #r1 has lower bound + \fBmovzwl\fR (lc)+,r2 #r2 has upper-lower bound + \fBsubl3\fR r1,(sp)+,r3 #r3 has base subscript + \fBcmpl\fR r3,r2 #check for out of bounds + \fBbgtru\fR esubscr + \fBmull2\fR r0,r3 #calculate byte offset + \fBaddl2\fR r3,(sp) #calculate actual address + \fBjmp\fR (loop) +esubscr: + \fBmovw\fR $ESUBSCR,_perrno + \fBjbr\fR error +.DE +.IP +Here the lower bound is subtracted, and range checked against the +upper minus lower bound. +The offset is then scaled to a byte offset into the array +and added to the base address on the stack. +Multi-dimension subscripts are translated as a sequence of single subscriptings. +.SH +IND* +.IP +For indirect references through +.B var +parameters and pointers, +the interpreter has a set of indirection operators that convert a pointer +on the stack into a value on the stack from that address. +different +.SM IND +operators are necessary because of the possibility of different +length operands. +The +.SM IND14 +and +.SM IND24 +operators do conversions to long +as they push their data. +.NH 2 +Arithmetic operators +.PP +The interpreter has many arithmetic operators. +All operators produce results long enough to prevent overflow +unless the bounds of the base type are exceeded. +The basic operators available are +.DS +Addition: ADD*, SUCC* +Subtraction: SUB*, PRED* +Multiplication: MUL*, SQR* +Division: DIV*, DVD*, MOD* +Unary: NEG*, ABS* +.DE +.NH 2 +Range checking +.PP +The interpreter has several range checking operators. +The important distinction among these operators is between values whose +legal range begins at zero and those that do not begin at zero, +for example +a subrange variable whose values range from 45 to 70. +For those that begin at zero, a simpler ``logical'' comparison against +the upper bound suffices. +For others, both the low and upper bounds must be checked independently, +requiring two comparisons. +On the +.SM "VAX 11/780" +both checks are done using a single index instruction +so the only gain is in reducing the inline data. +.NH 2 +Case operators +.PP +The interpreter includes three operators for +.B case +statements that are used depending on the width of the +.B case +label type. +For each width, the structure of the case data is the same, and +is represented in figure 2.4. +.sp 1 +.so fig2.4.n +.PP +The +.SM CASEOP +case statement operators do a sequential search through the +case label values. +If they find the label value, they take the corresponding entry +from the transfer table and cause the interpreter to branch to the +specified statement. +If the specified label is not found, an error results. +.PP +The +.SM CASE +operators take the number of cases as a sub-opcode +if possible. +Three different operators are needed to handle single byte, +word, and long case transfer table values. +For example, the +.SM CASEOP1 +operator has the following code sequence: +.DS +.mD +_CASEOP1: + \fBcvtbl\fR (lc)+,r0 + \fBbneq\fR L1 + \fBcvtwl\fR (lc)+,r0 #r0 has length of case table +L1: + \fBmovaw\fR (lc)[r0],r2 #r2 has pointer to case labels + \fBmovzwl\fR (sp)+,r3 #r3 has the element to find + \fBlocc\fR r3,r0,(r2) #r0 has index of located element + \fBbeql\fR caserr #element not found + \fBmnegl\fR r0,r0 #calculate new lc + \fBcvtwl\fR (r2)[r0],r1 #r1 has lc offset + \fBaddl2\fR r1,lc + \fBjmp\fR (loop) +caserr: + \fBmovw\fR $ECASE,_perrno + \fBjbr\fR error +.DE +.PP +Here the interpreter first computes the address of the beginning +of the case label value area by adding twice the number of case label +values to the address of the transfer table, since the transfer +table entries are 2 byte address offsets. +It then searches through the label values, and generates an ECASE +error if the label is not found. +If the label is found, the index of the corresponding entry +in the transfer table is extracted and that offset is added +to the interpreter location counter. +.NH 2 +Operations supporting pxp +.PP +The following operations are defined to do execution profiling. +.SH +PXPBUF w +.IP +Causes the interpreter to allocate a count buffer +with +.I w +four byte counters +and to clear them to zero. +The count buffer is placed within an image of the +.I pmon.out +file as described in the +.I "PXP Implementation Notes." +The contents of this buffer are written to the file +.I pmon.out +when the program ends. +.SH +COUNT w +.IP +Increments the counter specified by +.I w . +.SH +TRACNT w,A +.IP +Used at the entry point to procedures and functions, +combining a transfer to the entry point of the block with +an incrementing of its entry count. +.NH 2 +Set operations +.PP +The set operations: +union +.SM ADDT, +intersection +.SM MULT, +element removal +.SM SUBT, +and the set relationals +.SM RELT +are straightforward. +The following operations are more interesting. +.SH +CARD s +.IP +Takes the cardinality of a set of size +.I s +bytes on top of the stack, leaving a 2 byte integer count. +.SM CARD +uses the +.B ffs +opcode to successively count the number of set bits in the set. +.SH +CTTOT s,w,w +.IP +Constructs a set. +This operation requires a non-trivial amount of work, +checking bounds and setting individual bits or ranges of bits. +This operation sequence is slow, +and motivates the presence of the operator +.SM INCT +below. +The arguments to +.SM CTTOT +include the number of elements +.I s +in the constructed set, +the lower and upper bounds of the set, +the two +.I w +values, +and a pair of values on the stack for each range in the set, single +elements in constructed sets being duplicated with +.SM SDUP +to form degenerate ranges. +.SH +IN s,w,w +.IP +The operator +.B in +for sets. +The value +.I s +specifies the size of the set, +the two +.I w +values the lower and upper bounds of the set. +The value on the stack is checked to be in the set on the stack, +and a Boolean value of +.I true +or +.I false +replaces the operands. +.SH +INCT +.IP +The operator +.B in +on a constructed set without constructing it. +The left operand of +.B in +is on top of the stack followed by the number of pairs in the +constructed set, +and then the pairs themselves, all as single word integers. +Pairs designate runs of values and single values are represented by +a degenerate pair with both value equal. +This operator is generated in grammatical constructs such as +.LS +\fBif\fR character \fBin\fR [`+', '\-', `*', `/'] +.LE +.IP +or +.LS +\fBif\fR character \fBin\fR [`a'..`z', `$', `_'] +.LE +.IP +These constructs are common in Pascal, and +.SM INCT +makes them run much faster in the interpreter, +as if they were written as an efficient series of +.B if +statements. +.NH 2 +Miscellaneous +.PP +Other miscellaneous operators that are present in the interpreter +are +.SM ASRT +that causes the program to end if the Boolean value on the stack is not +.I true, +and +.SM STOI , +.SM STOD , +.SM ITOD , +and +.SM ITOS +that convert between different length arithmetic operands for +use in aligning the arguments in +.B procedure +and +.B function +calls, and with some untyped built-ins, such as +.SM SIN +and +.SM COS \&. +.PP +Finally, if the program is run with the run-time testing disabled, there +are special operators for +.B for +statements +and special indexing operators for arrays +that have individual element size that is a power of 2. +The code can run significantly faster using these operators. +.NH 2 +Mathematical Functions +.PP +The transcendental functions +.SM SIN , +.SM COS , +.SM ATAN , +.SM EXP , +.SM LN , +.SM SQRT , +.SM SEED , +and +.SM RANDOM +are taken from the standard UNIX +mathematical package. +These functions take double precision floating point +values and return the same. +.PP +The functions +.SM EXPO , +.SM TRUNC , +and +.SM ROUND +take a double precision floating point number. +.SM EXPO +returns an integer representing the machine +representation of its argument's exponent, +.SM TRUNC +returns the integer part of its argument, and +.SM ROUND +returns the rounded integer part of its argument. +.NH 2 +System functions and procedures +.SH +LLIMIT +.IP +A line limit and a file pointer are passed on the stack. +If the limit is non-negative the line limit is set to the +specified value, otherwise it is set to unlimited. +The default is unlimited. +.SH +STLIM +.IP +A statement limit is passed on the stack. The statement limit +is set as specified. +The default is 500,000. +No limit is enforced when the ``p'' option is disabled. +.SH +CLCK +.br +SCLCK +.IP +.SM CLCK +returns the number of milliseconds of user time used by the program; +.SM SCLCK +returns the number of milliseconds of system time used by the program. +.SH +WCLCK +.IP +The number of seconds since some predefined time is +returned. Its primary usefulness is in determining +elapsed time and in providing a unique time stamp. +.sp +.LP +The other system time procedures are +.SM DATE +and +.SM TIME +that copy an appropriate text string into a pascal string array. +The function +.SM ARGC +returns the number of command line arguments passed to the program. +The procedure +.SM ARGV +takes an index on the stack and copies the specified +command line argument into a pascal string array. +.NH 2 +Pascal procedures and functions +.SH +PACK s,w,w,w +.br +UNPACK s,w,w,w +.IP +They function as a memory to memory move with several +semantic checks. +They do no ``unpacking'' or ``packing'' in the true sense as the +interpreter supports no packed data types. +.SH +NEW s +.br +DISPOSE s +.IP +An +.SM LV +of a pointer is passed. +.SM NEW +allocates a record of a specified size and puts a pointer +to it into the pointer variable. +.SM DISPOSE +deallocates the record pointed to by the pointer +and sets the pointer to +.SM NIL . +.sp +.LP +The function +.SM CHR* +converts a suitably small integer into an ascii character. +Its primary purpose is to do a range check. +The function +.SM ODD* +returns +.I true +if its argument is odd and returns +.I false +if its argument is even. +The function +.SM UNDEF +always returns the value +.I false . |