diff options
Diffstat (limited to 'contrib/perl5/lib/unicode/NamesList.html')
-rw-r--r-- | contrib/perl5/lib/unicode/NamesList.html | 226 |
1 files changed, 0 insertions, 226 deletions
diff --git a/contrib/perl5/lib/unicode/NamesList.html b/contrib/perl5/lib/unicode/NamesList.html deleted file mode 100644 index 0bfc5db..0000000 --- a/contrib/perl5/lib/unicode/NamesList.html +++ /dev/null @@ -1,226 +0,0 @@ -<html> - -<head> -<meta name="GENERATOR" content="Microsoft FrontPage 3.0"> -<title>Unicode 3.0 NamesList File Structure</title> -</head> - -<body> - -<h3>Unicode NamesList File Format</h3> - -<p>Last updated: 1999-07-06</p> - -<h3>1.0 Introduction</h3> - -<p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used -to drive the layout of the character code charts in the Unicode Standard. The information -in this file is a combination of several fields from the UnicodeData.txt and Blocks.txt files, -together with additional annotations for many characters. This document describes the -syntax rules for the file format, but also gives brief information on how each construct -is rendered when laid out for the book. Some of the syntax elements were used in -preparation of the drafts of the book and may not be present in the final, released form -of the NamesList.txt file.</p> - -<p>The same input file can be used to do the draft preparation for ISO/IEC 10646 (referred -below as ISO-style). This necessitates the presence of some information in the name list -file that is not needed (and in fact removed during parsing) for the Unicode book.</p> - -<p>With access to the layout program (unibook.exe) it is a simple matter of creating -name lists for the purpose of formatting working drafts containing proposed characters.</p> - -<h3>1.1 NamesList File Overview</h3> - -<p>The *.lst files are plain text files which in their most simple form look like this</p> - -<p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br> -; this is a file comment (ignored)<br> -0020<tab>SPACE<br> -0021<tab>EXCLAMATION MARK<br> -0022<tab>QUOTATION MARK<br> -. . . <br> -007F<tab>DELETE</p> - -<p>The semicolon (as first character), @ and <tab> characters are used by the file -syntax and must be provided as shown. Hexadecimal digits must be in UPPER CASE). A double -@@ introduces a block header, with the title, and start and ending code of the block -provided as shown.</p> - -<p>For an ISO-style, minimal name list, only the NAME_LINE and BLOCKHEADER and their -constituent syntax elements are needed.</p> - -<p>The full syntax with all the options is provided in the following sections.</p> - -<h3>1.2 NamesList File Structure</h3> - -<p>This section gives defines the overall file structure</p> - -<pre><strong>NAMELIST: TITLE_PAGE* BLOCK* -</strong> -<strong>TITLE_PAGE: TITLE - | TITLE_PAGE SUBTITLE - | TITLE_PAGE SUBHEADER - | TITLE_PAGE IGNORED_LINE - | TITLE_PAGE EMPTY_LINE - | TITLE_PAGE COMMENTLINE - | TITLE_PAGE NOTICE - | TITLE_PAGE PAGEBREAK -</strong> -<strong>BLOCK: BLOCKHEADER - | BLOCK CHAR_ENTRY - | BLOCK SUBHEADER - | BLOCK NOTICE - | BLOCK EMPTY_LINE - | BLOCK IGNORED_LINE - | BLOCK PAGEBREAK - -CHAR_ENTRY: NAME_LINE | RESERVED_LINE - | CHAR_ENTRY ALIAS_LINE - | CHAR_ENTRY COMMENT_LINE - | CHAR_ENTRY CROSS_REF - | CHAR_ENTRY DECOMPOSITION - | CHAR_ENTRY COMPAT_MAPPING - | CHAR_ENTRY IGNORED_LINE - | CHAR_ENTRY EMPTY_LINE - | CHAR_ENTRY NOTICE -</strong></pre> - -<p>In other words:<br> -<br> -Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p> - -<p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, and IGNORED_LINE may -occur before the first BLOCKHEADER.</p> - -<p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of -the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE, -CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, NOTICE, EMPTY_LINE and IGNORED_LINE.</p> - -<p>Except for EMPTY_LINE, NOTICE and IGNORED_LINE, none of these lines may occur in any other -place. </p> - -<p>Note: A NOTICE displays differently depending on whether it follows a header or title -or is part of a CHAR_ENTRY.</p> - -<h3>1.3 NamesList File Elements</h3> - -<p>This section provides the details of the syntax for the individual elements.</p> - -<pre><small><strong>ELEMENT SYNTAX</strong> // How rendered</small></pre> - -<pre><small><strong>NAME_LINE: CHAR <tab> LINE -</strong> // the CHAR and the corresponding image are echoed, - // followed by the name as given in LINE - -<strong> CHAR TAB NAME COMMENT LF -</strong> // Names may have a comment, which is stripped off - // unless the file is parsed for an ISO style list - -<strong>RESERVED_LINE: CHAR TAB <reserved> -</strong> // the CHAR is echoed followed by an icon for the - // reserved character and a fixed string e.g. <reserved> - -<strong>COMMMENT_LINE: <tab> "*" SP EXPAND_LINE -</strong> // * is replaced by BULLET, output line as comment - <strong><tab> EXPAND_LINE</strong> - // output line as comment - -<strong>ALIAS_LINE: <tab> "=" SP LINE -</strong> // replace = by itself, output line as alias - -<strong>CROSS_REF: <tab> "X" SP EXPAND_LINE -</strong> // X is replaced by a right arrow -<strong> <tab> "X" SP "(" STRING SP "-" SP CHAR ")" -</strong> // X is replaced by a right arrow - // the "(", "-", ")" are removed, the - // order of CHAR and STRING is reversed - // i.e. both inputs result in the same output - -<strong>IGNORED_LINE: <tab> ";" EXPAND_LINE -EMPTY_LINE: LF -</strong> // empty lines and file comments are ignored - -<strong>DECOMPOSITION: <tab> ":" EXPAND_LINE -</strong> // replace ':' by EQUIV, expand line into - // decomposition - -<strong>COMPAT_MAPPING: <tab> "#" SP EXPAND_LINE -</strong> // replace '#' by APPROX, output line as mapping - -<strong>NOTICE: "@+" <tab> LINE -</strong> // skip '@+', output text as notice -<strong> "@+" TAB * SP LINE -</strong> // skip '@', output text as notice - // "*" expands to a bullet character - // Notices following a character code apply to the - // character and are indented. Notices not following - // a character code apply to the page/block/column - // and are italicized, but not indented - -<strong>SUBTITLE: "@@@+" <tab> LINE -</strong> // skip "@@@+", output text as subtitle - -<strong>SUBHEADER: "@" <tab> LINE -</strong> // skip '@', output line as text as column header - -<strong>BLOCKHEADER: "@@" <tab> BLOCKSTART <tab> BLOCKNAME <tab> BLOCKEND -</strong> // skip "@@", cause a page break and optional - // blank page, then output one or more charts - // followed by the list of character names. - // use BLOCKSTART and BLOCKEND to define the - // what characters belong to a block - // use blockname in page and table headers - <strong> "@@" <tab> BLOCKSTART <tab> BLOCKNAME COMMENT <tab> BLOCKEND - </strong>// if a comment is present it replaces the blockname - // when an ISO-style namelist is laid out - -<strong>BLOCKSTART: CHAR</strong> // first character position in block -<strong>BLOCKEND: CHAR</strong> // last character position in block -<strong>PAGE_BREAK: "@@"</strong> // insert a (column) break - -<strong>TITLE: "@@@" <tab> LINE</strong> - // skip "@@@", output line as text - // Title is used in page headers - -<strong>EXPAND_LINE: {CHAR | STRING}+ LF </strong> - // all instances of CHAR *) are replaced by - // CHAR NBSP x NBSP where x is the single Unicode - // character corresponding to char - // If character is combining, it is replaced with - // CHAR NBSP <circ> x NBSP where <circ> is the - // dotted circle</small> -</pre> - -<h3><strong>1.4 NamesList File Primitives</strong></h3> - -<p>The following are the primitives and terminals for the NamesList syntax.</p> - -<pre><small><strong>LINE: STRING LF -COMMENT: "(" NAME ")" - "(" NAME ")" "*" -</strong> -<strong>NAME</strong>: <sequence of ASCII characters, except "(" or ")" > -<strong>STRING</strong>: <sequence of Latin-1 characters> -<strong>CHAR</strong>: <strong>X X X X</strong> - <strong>| X X X X X X X X X</strong></small> -<small><strong>X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F" -<tab>:</strong> <sequence of one or more ASCII tab characters 0x09> -<strong>SP</strong>: <ASCII 0x20> -<strong>LF</strong>: <any sequence of ASCII 0x0A and 0x0D> -</small></pre> - -<p><strong>Notes:</strong> - -<ul> - <li>Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from - being misinterpreted as ISO CHAR.</li> - <li>Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as - UTF-16LE.</li> - <li>The final LF in the file must be present</li> - <li>A CHAR inside ' or " is expanded, but only its glyph image is printed, the - code value is not echoed</li> - <li>Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules. - Apostrophes are supported, but nested quotes are not.</li> -</ul> -</body> -</html> |