diff options
Diffstat (limited to 'subversion/libsvn_fs_base/notes')
-rw-r--r-- | subversion/libsvn_fs_base/notes/TODO | 137 | ||||
-rw-r--r-- | subversion/libsvn_fs_base/notes/fs-history | 270 | ||||
-rw-r--r-- | subversion/libsvn_fs_base/notes/structure | 1086 |
3 files changed, 1493 insertions, 0 deletions
diff --git a/subversion/libsvn_fs_base/notes/TODO b/subversion/libsvn_fs_base/notes/TODO new file mode 100644 index 0000000..c72af03 --- /dev/null +++ b/subversion/libsvn_fs_base/notes/TODO @@ -0,0 +1,137 @@ +What's happening now + +The filesystem needs some path validation stuffs independent of the +SVN path utilities. A filesystem path is a well-defined Thing that +should be held a safe distance away from future changes to SVN's +general path library. + + +Incorrectnesses + +We must ensure that node numbers are never reused. If we open a node, +svn_fs_delete it, and then create new nodes, what happens when the +original node structure suddenly comes to refer to an entirely +different node? Files become directories? + +We should convert filenames to some canonical Unicode form, for +comparison. + +Does everyone call svn_fs__check_fs who should? + +svn_fs_delete will actually delete non-empty directories, if they're +not cloned. This is inconsistent; should it be fixed? + +Does every operation on a deleted node or completed transaction fail +gracefully? + +Produce helpful error messages when filename paths contain null +characters. + + +Uglinesses + +Fix up comments in svn_fs.h for transactions. + +Add `public name' member to filesystem structure, to use to identify +the filesystem in error messages. When driven by DAV, this could be a +URL. + +When a dag function signals an error, it has no idea what the path of +the relevant node was. But node revision ID's are pretty useless to +the user. tree.c should probably rewrap some errors. + +svn_fs__getsize shouldn't rely on a maximum value for detecting +overflow. + +The use of svn_fs__getsize in svn_fs__parse_id is ugly --- what if +svn_vernum_t and apr_size_t aren't the same size? + +Consider some macros or accessory functions for referencing the pieces +of the NODE-REVISION skel (instead of seeing stuff like +node->children->next->next and such other unreadable rubbish) + + +Slownesses + +We don't store older node revisions as deltas yet. + +The delta algorithm walks the whole tree using a single pool, so the +memory used is proportional to the size of the target tree. Instead, +it should use a separate subpool every time it recurses into a new +directory, and free that subpool as soon as it's done processing that +subdirectory, so the memory used is proportional to the depth of the +tree. + +We should move as much real content out of the NODE-REVISION skel as +possible; the skels should be holding only small stuff (node kind, +flags). +- File contents and deltas should be moved out to a `contents' table. + The NODE-REVISION skel should simply contain a key into that table. +- Directory contents should be moved out to a `directories' table, + with a separate table entry for each directory entry. Keys into the + table should be of the form `NODE-ID ENTRY-NAME NODE-REVISION', and + values should be node revision ID's, or the word `deleted'; to look + up an entry named E in a directory whose node revision is N.R, + search for the entry `N E x', where x is the largest number present + <= R. +- Property lists should be moved out to a table `properties', indexed + similarly to the above. We could deltify property contents the + same way we do file contents. + + +Amenities + +Extend svn_fs_copy to handle mutable nodes. + +Long term ideas: + +- directory entry cache: + Create a cache mapping a node revision id X plus a filename component + N onto a new node revision id Y, meaning that X is a directory in + which the name N is bound to ID Y. If everything were in the cache, + this function could run with no I/O except for the final node. + + Since node revisions never change, we wouldn't have to worry about + invalidating the cache. Mutable node objects will need special + handling, of course. + +- fulltext cache: + If we've recently computed a node's fulltext, we might want to keep + that around in case we need to compute one of its nearby ancestors' + fulltext, too. This could be a waste, though --- the access + patterns are a mix of linear scan (backwards to reconstruct a given + revision) and random (who knows what node we'll hit next), so it's + not clear what cache policy would be effective. Best to record some + data on how many delta applications a given cache would avoid before + implementing it. + +- delta cache: + As people update, we're going to be recomputing text deltas for the + most recently changed files pretty often. It might be worthwhile to + cache the deltas for a little while. + +- Handle Unicode canonicalization for directory and property names + ourselves. People should be able to hand us any valid UTF-8 + sequence, perhaps with precomposed characters or non-spacing marks + in a non-canonical order, and find the appropriate matches, given + the rules defined by the Unicode standard. + +Keeping repositories alive in the long term: Berkeley DB is infamous +for changing its file format from one revision to the next. If someone +saves a Subversion 1.0 repository on a CD somewhere, and then tries to +read it seven years later, their chance of being able to read it with +the latest revision of Subversion is nil. The solution: + +- Define a simply XML repository dump format for the complete + repository data. This should be the same format we use for CVS + repository conversion. We'll have an import function. + +- Write a program that is simple and self-contained --- does not use + Berkeley DB, no fancy XML tools, uses nothing but POSIX read and + seek --- that can dump a Subversion repository in that format. + +- For each revision of Subversion, make a sample repository, and + archive a copy of it away as test data. + +- Write a test suite that verifies that the repository dump program + can handle all of the archived formats. diff --git a/subversion/libsvn_fs_base/notes/fs-history b/subversion/libsvn_fs_base/notes/fs-history new file mode 100644 index 0000000..e4f51a6 --- /dev/null +++ b/subversion/libsvn_fs_base/notes/fs-history @@ -0,0 +1,270 @@ + -*- text -*- + + Subversion Filesystem History + (a love song for libsvn_fs, by C. Michael Pilato) + + +The Subversion filesystem can be your best friend, or your worst +enemy, usually depending on which side of the public API you are +working on. Callers of the libsvn_fs interfaces do their work in a +world pleasantly addressed by roots (the name given to a revision or +transaction snapshot of the versioned directory tree) and paths under +those roots. But once you swim beneath the surface, you quickly +realize that there is a beast both beautiful and dangerous lying in +wait. What looks to the outside world as a sort of coordinate system +with axes for "Time" and "Location" is, in fact, a complicated DAG +subsystem, with nodes that represent revisions of versioned locations +all interconnected in various relationships with each other. + +The goal of this document is straightforward: to relay knowledge about +how to untangle that DAG subsystem -- knowledge which likely lives +only in the minds of a few developers -- so that the Few might become +the Many. + + + +Node-Revisions: The Nodes of the DAG + +When working outside the filesystem API, people generally talk about +their versioned resources in terms of the paths of those resources, +and the global revisions (or revisions-to-be) in which those paths +are located. But inside the filesystem, paths are broken down and +stored as a hierarchical linked-list of path components. Each of +these path components has its own historical lineage (because +Subversion versions directories, too!), and each revision of that +lineage is referred to as a "node-revision". It is these +node-revisions which are the nodes of the DAG subsystem, or "DAG +nodes". + +DAG nodes are identified by unique keys called "node-revision IDs", +and are inter-connected in a variety of ways. A DAG node that +represents a directory stores information about which other DAG nodes +represent the children of that directory. A DAG node contains +information about which other DAG node is its historical predecessor. +By tracing these links from node to node, we can effectively traverse +both space and time, both the geography and the chronology of the +filesystem landscape. + +For example, the path "/trunk/src/main.c" in revision 4 of the +filesystem consumes four DAG nodes: one for "/", one for "/trunk", one +for "/trunk/src", and one for "/trunk/src/main.c". The DAG node for +"/" contains a list of the names and node-revision IDs of its +children, among which is the node-revision ID for the child named +"trunk". Similar links are found in "/trunk" (for "src") and +"/trunk/src" (for "main.c"). Additionally, if these paths existed in +different forms in previous revisions of the filesystem, their DAG +nodes would store the node-revision IDs of their respective +predecessor nodes. + +Whenever someone uses the public API to query for information about a +versioned path under a particular root, the typical course of action +under-the-hood is as follows: + + 1. The root refers to a particular snapshot of the DAG node tree, + and from this we can learn the node-revision ID of the node + which represents the root directory ("/") as it appears in that + snapshot. Given this node-revision ID, it's all DAG from here. + + 2. The path is split into components and traversed, beginning with + the root node, and walking down toward the full path. Each + intermediate node-revision is read, its entries list parsed, and + the next component looked up in that entries list to find the + next node-revision ID along the traversal path. + + 3. Finally, we wind up with a node-revision ID for our original + path. We use it and its associated node-revision to answer the + query. + +Seems pretty easy, doesn't it? Keep reading. + + + +All About Node-Revision IDs + +As previously mentioned, each node-revision in the filesystem has a +unique key, referred to as the node-revision ID. This key is +typically represented as a string which looks like a period-separated +list of its three components: + + 1. node ID: This key is unique to the members of a single + historical lineage. Differing versions of the same versioned + resource, irrespective of the paths and revision in which those + versions are located, all share this ID. If two node-revisions + have different node IDs, their are historically unrelated. + + 2. copy ID: This key uniquely identifies a copy operation, and is + sometimes referred to (or at least thought of) as a "branch ID." + If two node-revisions have the same copy ID, they are said to be + on the same branch. The only exception to this is in the key + "0", a special key that means "not branched". + + 3. txn ID: This key uniquely identifies the Subversion transaction + in which this node-revision came into existence. + +Whenever someone uses the public API to *modify* a versioned resource, +these actions are much the same as those used when querying. But +there are important differences. + + 1. The path is traversed in the same manner is described in the + previous section. The result is an in-memory linked-list of + information about the node-revisions which comprise the + components of the path. + + 2. But before any changes can be made to a path, its node-revision + and those of its parent directories must first be cloned so that + changes to them don't affect previous incarnations of those + node-revisions. This process is called "making the path + mutable". If previous operations under this transaction caused + one or more of the parent directories to be made mutable + already, they are not again cloned. + + 3. Once the path and all its parents are mutable, the desired + changes can be made to the cloned node-revision, and they in no + way affect prior history. + +To clone a node-revision means to literally make a duplicate of it +which is granted its own unique node-revision ID. The new +node-revision ID consists of the same node ID as the node-revision +that was cloned (since this is just another point along the historical +lineage of this versioned resource), a copy ID (which will be +discussed later), and the txn ID in which this modification is +occuring. + +There are some cool things we can read between the lines above. Since +the only time a node-revision comes into existence is when it is brand +new or a fresh clone, and we never do cloning except during a +modification, then we can use the txn ID as a sort of mutability flag. +Mutability of a node-revision is determined by comparing the txn ID of +the node-revision with the ID of the Subversion transaction being used +to modify the filesystem -- if, and only if, they are the same, the node +is allowed to be changed inside that transaction. + +So, we know how txn IDs come into existence now. And the origin of +node IDs hardly warrants its own paragraph: brand new lines of history +(introduced with svn_fs_make_file() and svn_fs_make_dir()) get new +unique node IDs, and every other node-revision that is created simply +keeps the same node ID as the node-revision on which it is based. + +So what about those copy IDs? + +Copy IDs are assigned to nodes-revisions to denote on which "branch" +of a line of history that node-revision resides. (They are called +copy IDs for political reasons, really -- Subversion doesn't expose a +branching API per se, instead promoting the idea that branches are +just forks in the development of a line of history that can just as +easily be represented using path semantics.) New copy IDs are +allocated whenever a branching operation occurs. New node-revisions +can inherit the copy IDs of their predecessors (in the trivial cloning +case), inherit the copy-IDs of one of their parents (by nature of +their parent being copied), or inherit new copy-IDs. In the absence +of any branching, node-revisions are assigned the special copy ID "0". + + + +Copies and Copy IDs + +Currently there are two kinds of copy operation. The first is a +"real" copy, and is the direct result of a call to svn_fs_copy(). +When a real copy is made, the node-revision of the copy source is +cloned, and earns its own brand new unique node-revision ID. This +node-revision ID is constructed from the original node ID, a brand new +copy ID, and (as always) the txn ID of the transaction in which the +copy is being made. + +The Subversion filesystem uses a "cheap copy/lazy migration" model. +This means that when a directory node-revision is copied via +svn_fs_copy(), only the node-revision of the top of the copied "tree" +is cloned (again, earning a new copy ID), not every child of that +tree. This makes the svn_fs_copy() operation quite fast, at least +compared to the alternative. From that point, any children of the +copy target are lazily migrated. The first time they are themselves +modified after the original copy, they are cloned from their original +source location into the new target location. This lazy migration +procedure costs about the same as a regular cloning operation, which +keeps the "cheap copy" cheap, even the morning after. + +Copies of files behave no differently than copies of directories. But +files never have children, so effectively the "tree" being copied is +exactly one node-revision. This node-revision is explicitly cloned at +the time of the copy, and there is nothing to lazily migrate +afterwards. + +The second type of copy operation is a "soft" copy. These types of +copies are not explicitly triggered via the filesystem API, but are +implementation artifacts of other filesystem operations. A soft copy +happens whenever a node-revision exists in a different branch than +that of its parent, and the parent is again branched. Huh?! Let's +see if an example will help explain this a bit. + +Say you have a directory, "/trunk". Now, into "/trunk" you copy a +file "README" from some other part of the tree. You have now +effectively branched the original "README"'s history -- part of it +will live on in the original location, but part of it now thrives in +its new "/trunk/README" location. The copy operation assigned a brand +new copy ID to the new node-revision for "/trunk/README", which is +necessarily different from the copy ID assigned to the node-revision +for "/trunk". + +Later, you decide to copy "/trunk" to "/branches/mine". So the new +"/branches/mine" also gets a brand new copy ID, since it is now a +historical branch of "/trunk". But what happens when +"/branches/mine/README" is cloned later as part of some edits you are +making? What copy ID does the new clone get? Because "/trunk/README" +was on a different historical branch than "/trunk", our copy of +"/trunk" causes (in "README") a branch of a branch. So +"/branches/mine/README" gets a brand new copy ID, and the filesystem +remembers that the copy operation associated with that ID was a soft +copy. + + [### Right about here, C-Mike's memory starts getting fuzzy ###] + +The following is the copy ID inheritance algorithm, used to calculate +what copy ID a node revision will use when cloned for mutability. +Remember that a node revision is never cloned until its parent is +first cloned. + + 1. If the node revision is already mutable, its copy ID never + changes. + + 2. If the node revision has a copy ID of "0" (which is to say, it's + never been itself copied or cloned as a child of a copied + parent), then it inherits whatever copy ID its parent winds up + with. + + 3. If the node revision is on the same branch as its parent before + cloning, it will remain on the same branch as its parent after + cloning. A node revision can be said to be on the same branch + as its parent if: + + a) their copy IDs are the same, or + + b) the node revision is not a branch point (meaning, it was + not the node revision created by the copy associated with + its copy ID), or + + c) the node revision is a branch point which being accessed via + its copy destination path. + + 4. If, however, the node revision is *not* on the same branch as + its parent before cloning, it cannot be on the same branch as + its parent after cloning. This breaks down to two cases: + + a) If the node revision was the target of the copy operation + whose ID it holds, then it gets to keep its same copy ID. + + b) Otherwise, the node revision is the unedited child of some + parent that was copied, and wasn't on the same branch as + that parent before the copy. In this special case, the + cloned node revision will get a brand new copy ID which + points to one of those "soft copy" things we've been + talking about. + +The initial root directory's node revision, created when the +filesystem is initialized, begins life with a magical "0" copy ID. +Afterward, any new nodes (as in, freshly created files and +directories) begin life with the same copy ID as their parent. + + +Traversing History + + ### todo: put the history harvesting algorithm here diff --git a/subversion/libsvn_fs_base/notes/structure b/subversion/libsvn_fs_base/notes/structure new file mode 100644 index 0000000..8e2159f --- /dev/null +++ b/subversion/libsvn_fs_base/notes/structure @@ -0,0 +1,1086 @@ +Subversion on Berkeley DB -*- text -*- + +There are many different ways to implement the Subversion filesystem +interface. You could implement it directly using ordinary POSIX +filesystem operations; you could build it using an SQL server as a +back end; you could build it on RCS; and so on. + +This implementation of the Subversion filesystem interface is built on +top of Berkeley DB (http://www.sleepycat.com). Berkeley DB supports +transactions and recoverability, making it well-suited for Subversion. + + + +Nodes and Node Revisions + +In a Subversion filesystem, a `node' corresponds roughly to an +`inode' in a Unix filesystem: + + * A node is either a file or a directory. + + * A node's contents change over time. + + * When you change a node's contents, it's still the same node; it's + just been changed. So a node's identity isn't bound to a specific + set of contents. + + * If you rename a node, it's still the same node, just under a + different name. So a node's identity isn't bound to a particular + filename. + +A `node revision' refers to a node's contents at a specific point in +time. Changing a node's contents always creates a new revision of that +node. Once created, a node revision's contents never change. + +When we create a node, its initial contents are the initial revision of +the node. As users make changes to the node over time, we create new +revisions of that same node. When a user commits a change that deletes +a file from the filesystem, we don't delete the node, or any revision +of it --- those stick around to allow us to recreate prior revisions of +the filesystem. Instead, we just remove the reference to the node +from the directory. + + + +ID's + +Within the database, we refer to nodes and node revisions using a +string of three unique identifiers (the "node ID", the "copy ID", and +the "txn ID"), separated by periods. + + node_revision_id ::= node_id '.' copy_id '.' txn_id + +The node ID is unique to a particular node in the filesystem across +all of revision history. That is, two node revisions who share +revision history (perhaps because they are different revisions of the +same node, or because one is a copy of the other, e.g.) have the same +node ID, whereas two node revisions who have no common revision +history will not have the same node ID. + +The copy ID is a key into the `copies' table (see `Copies' below), and +identifies that a given node revision, or one of its ancestors, +resulted from a unique filesystem copy operation. + +The txn ID is just an identifier that is unique to a single filesystem +commit. All node revisions created as part of a commit share this txn +ID (which, incidentally, gets its name from the fact that this id is +the same id used as the primary key of Subversion transactions; see +`Transactions' below). + +A directory entry identifies the file or subdirectory it refers to +using a node revision ID --- not a node ID. This means that a change +to a file far down in a directory hierarchy requires the parent +directory of the changed node to be updated, to hold the new node +revision ID. Now, since that parent directory has changed, its parent +needs to be updated, and so on to the root. We call this process +"bubble-up". + +If a particular subtree was unaffected by a given commit, the node +revision ID that appears in its parent will be unchanged. When +doing an update, we can notice this, and ignore that entire +subtree. This makes it efficient to find localized changes in +large trees. + + + +A Word About Keys + +Some of the Subversion database tables use base-36 numbers as their +keys. Some debate exists about whether the use of base-36 (as opposed +to, say, regular decimal values) is either necessary or good. It is +outside the scope of this document to make a claim for or against this +usage. As such, the reader will please note that for the majority of +the document, the use of the term "number" when referring to keys of +database tables should be interpreted to mean "a monotonically +increasing unique key whose order with respect to other keys in the +table is irrelevant". :-) + +To determine the actual type currently in use for the keys of a given +table, you are invited to check out the "Appendix: Filesystem +structure summary" section of this document. + + + +NODE-REVISION: how we represent a node revision + +We represent a given revision of a file or directory node using a list +skel (see include/private/svn_skel.h for an explanation of skels). +A node revision skel has the form: + + (HEADER PROP-KEY KIND-SPECIFIC ...) + +where HEADER is a header skel, whose structure is common to all nodes, +PROP-KEY is the key of the representation that contains this node's +properties list, and the KIND-SPECIFIC elements carry data dependent +on what kind of node this is --- file, directory, etc. + +HEADER has the form: + + (KIND CREATED-PATH [PRED-ID [PRED-COUNT [HAS-MERGEINFO MERGEINFO-COUNT]]]) + +where: + + * KIND indicates what sort of node this is. It must be one of the + following: + - "file", indicating that the node is a file (see FILE below). + - "dir", indicating that the node is a directory (see DIR below). + + * CREATED-PATH is the canonicalized absolute filesystem path at + which this node was created. + + * PRED-ID, if present, indicates the node revision which is the + immediate ancestor of this node. + + * PRED-COUNT, if present, indicates the number of predecessors the + node revision has (recursively). + + * HAS-MERGEINFO and MERGEINFO-COUNT, if present, indicate ... + ### TODO + +Note that a node cannot change its kind from one revision to the next. +A directory node is always a directory; a file node is always a file; +etc. The fact that the node's kind is stored in each node revision, +rather than in some revision-independent place, might suggest that +it's possible for a node to change kinds from revision to revision, but +Subversion does not allow this. + +PROP-KEY is a key into the `representations' table (see REPRESENTATIONS +below), whose value is a representation pointing to a string +(see `strings' table) that is a PROPLIST skel. + +The KIND-SPECIFIC portions are discussed below. + + + +PROPLIST: a property list is a list skel of the form: + + (NAME1 VALUE1 NAME2 VALUE2 ...) + +where each NAMEi is the name of a property, and VALUEi is the value of +the property named NAMEi. Every valid property list has an even +number of elements. + + + +FILE: how files are represented. + +If a NODE-REVISION's header's KIND is "file", then the node-revision +skel represents a file, and has the form: + + (HEADER PROP-KEY DATA-INFO [EDIT-DATA-KEY]) + +where + + DATA-INFO ::= DATA-KEY | (DATA-KEY DATA-KEY-UNIQID) + +and DATA-KEY identifies the representation for the file's current +contents, and EDIT-DATA-KEY identifies the representation currently +available for receiving new contents for the file. + +DATA-KEY-UNIQID ... +### TODO + +See discussion of representations later. + + + +DIR: how directories are represented. + +If the header's KIND is "dir", then the node-revision skel +represents a directory, and has the form: + + (HEADER PROP-KEY ENTRIES-KEY) + +where ENTRIES-KEY identifies the representation for the directory's +entries list (see discussion of representations later). An entries +list has the form + + (ENTRY ...) + +where each entry is + + (NAME ID) + +where: + + * NAME is the name of the directory entry, in UTF-8, and + + * ID is the ID of the node revision to which this entry refers + + + +REPRESENTATIONS: where and how Subversion stores your data. + +Some parts of a node revision are essentially constant-length: for +example, the KIND field and the REV. Other parts can have +arbitrarily varying length: property lists, file contents, and +directory entry lists. This variable-length data is often similar +from one revision to the next, so Subversion stores just the deltas +between them, instead of successive fulltexts. + +The HEADER portion of a node revision holds the constant-length stuff, +which is never deltified. The rest of a node revision just points to +data stored outside the node revision proper. This design makes the +repository code easier to maintain, because deltification and +undeltification are confined to a layer separate from node revisions, +and makes the code more efficient, because Subversion can retrieve +just the parts of a node it needs for a given operation. + +Deltifiable data is stored in the `strings' table, as mediated by the +`representations' table. Here's how it works: + +The `strings' table stores only raw bytes. A given string could be +any one of these: + + - a file's contents + - a delta that reconstructs file contents, or part of a file's contents + - a directory entry list skel + - a delta that reconstructs a dir entry list skel, or part of same + - a property list skel + - a delta that reconstructs a property list skel, or part of same + +There is no way to tell, from looking at a string, what kind of data +it is. A directory entry list skel is indistinguishable from file +contents that just happen to look exactly like the unparsed form of a +directory entry list skel. File contents that just happen to look +like svndiff data are indistinguishable from delta data. + +The code is able to interpret a given string because Subversion + + a) knows whether to be looking for a property list or some + kind-specific data, + + b) knows the `kind' of the node revision in question, + + c) always goes through the `representations' table to discover if + any undeltification or other transformation is needed. + +The `representations' table is an intermediary between node revisions +and strings. Node revisions never refer directly into the `strings' +table; instead, they always refer into the `representations' table, +which knows whether a given string is a fulltext or a delta, and if it +is a delta, what it is a delta against. That, combined with the +knowledge in (a) and (b) above, allows Subversion to retrieve the data +and parse it appropriately. A representation has the form: + + (HEADER KIND-SPECIFIC) + +where HEADER is + + (KIND TXN [MD5 [SHA1]]) + +The KIND is "fulltext" or "delta". TXN is the txn ID for the txn in +which this representation was created. MD5 is a checksum of the +representation's contents, that is, what the representation produces, +regardless of whether it is stored deltified or as fulltext. (For +compatibility with older versions of Subversion, MD5 may be +absent, in which case the filesystem behaves as though the checksum is +there and is correct.) An additional kind of checksum, SHA1, is present +in newer formats, starting with version ... +### TODO + +The TXN also serves as a kind of mutability flag: if txn T tries to +change a representation's contents, but the rep's TXN is not T, then +something has gone horribly wrong and T should leave the rep alone +(and probably error). Of course, "change a representation" here means +changing what the rep's consumer sees. Switching a representation's +storage strategy, for example from fulltext to deltified, wouldn't +count as a change, since that wouldn't affect what the rep produces. + +KIND-SPECIFIC varies considerably depending on the kind of +representation. Here are the two forms currently recognized: + + (("fulltext" TXN [MD5 [SHA1]]) STRING-KEY) + The data is at STRING-KEY in the `strings' table. + + (("delta" TXN [MD5 [SHA1]]) (OFFSET WINDOW) ...) + Each OFFSET indicates the point in the fulltext that this + element reconstructs, and WINDOW says how to reconstruct it: + + WINDOW ::= (DIFF SIZE REP-KEY [REP-OFFSET]) ; + DIFF ::= ("svndiff" VERSION STRING-KEY) + + Notice that a WINDOW holds only metadata. REP-KEY says what + the window should be applied against, or none if this is a + self-compressed delta; SIZE says how much data this window + reconstructs; VERSION says what version of the svndiff format + is being used (currently only version 0 is supported); and + STRING-KEY says which string contains the actual svndiff data + (there is no diff data held directly in the representations + table, of course). + + Note also that REP-KEY might refer to a representation that + itself requires undeltification. We use a delta combiner to + combine all the deltas needed to reproduce the fulltext from + some stored plaintext. + + Branko says this is what REP-OFFSET is for: + > The offsets embedded in the svndiff are stored in a string; + > these offsets would be in the representation. The point is that + > you get all the information you need to select the appropriate + > windows from the rep skel -- without touching a single + > string. This means a bit more space used in the repository, but + > lots less memory used on the server. + + We'll see if it turns out to be necessary. + +In the future, there may be other representations, for example +indicating that the text is stored elsewhere in the database, or +perhaps in an ordinary Unix file. + +Let's work through an example node revision: + + (("file" REV COUNT) PROP-KEY "2345") + +The entry for key "2345" in `representations' is: + + (("delta" TXN CHECKSUM) (0 (("svndiff" 0 "1729") 65 "2343"))) + +and the entry for key "2343" in `representations' is: + + (("fulltext" TXN CHECKSUM) "1001") + +while the entry for key "1729" in `strings' is: + + <some unprintable glob of svndiff data> + +which, when applied to the fulltext at key "1001" in strings, results +in this new fulltext: + + "((some text) (that looks) (deceptively like) (directory entries))" + +Et voila! Subversion knew enough, via the `representations' and +`strings' tables, to undeltify and get that fulltext; and knew enough, +because of the node revision's "file" type, to interpret the result as +file contents, not as a directory entry list. + +(Note that the `strings' table stores multiple DB values per key. +That is, although it's accurate to say there is one string per key, +the string may be divided into multiple consecutive blocks, all +sharing that key. You use a Berkeley DB cursor to find the desired +value[s], when retrieving a particular offset+len in a string.) + +Representations know nothing about ancestry -- the `representations' +table never refers to node revision id's, only to strings or to other +representations. In other words, while the `nodes' table allows +recovery of ancestry information, the `representations' and `strings' +tables together handle deltification and undeltification +*independently* of ancestry. At present, Subversion generally stores +the youngest strings in "fulltext" form, and older strings as "delta"s +against them (unless the delta would save no space compared to the +fulltext). However, there's nothing magic about that particular +arrangement. Other interesting alternatives: + + * We could store the N most recently accessed strings as fulltexts, + letting access patterns determine the most appropriate + representation for each revision. + + * We could occasionally store deltas against the N'th younger + revision, storing larger jumps with a frequency inverse to the + distance covered, yielding a tree-structured history. + +Since the filesystem interface doesn't expose these details, we can +change the representation pretty much as we please to optimize +whatever parameter we care about --- storage size, speed, robustness, +etc. + +Representations never share strings - every string is referred to by +exactly one representation. This is so that when we change a +representation to a different form (e.g. during deltification), we can +delete the strings containing the old form, and know that we're not +messing up any other reps by doing so. + + +Further Notes On Deltifying: +---------------------------- + +When a representation is deltified, it is changed in place. +New strings are created containing the new delta, the representation +is changed to refer to the new strings, and the original (usually +fulltext) string or strings are deleted. + +The node revisions referring to that representation will not be +changed; instead, the same rep key will now be associated with +different value. That way, we get reader locking for free: if someone +is reading a file while Subversion is deltifying that file, one of the +two sides will get a DB_DEADLOCK and svn_fs__retry_txn() will retry. + +### todo: add a note about cycle-checking here, too. + + + +The Berkeley DB "nodes" table + +The database contains a table called "nodes", which is a btree indexed +by node revision ID's, mapping them onto REPRESENTATION skels. Node 0 +is always the root directory, and node revision ID 0.0.0 is always the +empty directory. We use the value of the key 'next-key' to indicate +the next unused node ID. + +Assuming that we store the most recent revision on every branch as +fulltext, and all other revisions as deltas, we can retrieve any node +revision by searching for the last revision of the node, and then +walking backwards to specific revision we desire, applying deltas as +we go. + + + +REVISION: filesystem revisions, and the Berkeley DB "revisions" table + +We represent a filesystem revision using a skel of the form: + + ("revision" TXN) + +where TXN is the key into the `transactions' table (see 'Transactions' below) +whose value is the transaction that was committed to create this revision. + +The database contains a table called "revisions", which is a +record-number table mapping revision numbers onto REVISION skels. +Since Berkeley DB record numbers start with 1, whereas Subversion +filesystem revision numbers start at zero, revision V is stored as +record number V+1 in the `revisions' table. Filesystem revision zero +always has node revision 0.0.0 as its root directory; that node +revision is guaranteed to be an empty directory. + + + +Transactions + +Every transaction ends when it is either successfully committed, or +aborted. We call a transaction which has been either committed or +aborted "finished", and one which hasn't "unfinished". + +Transactions are identified by unique numbers, called transaction +ID's. Currently, transaction ID's are never reused, though this is +not mandated by the schema. In the database, we always represent a +transaction ID in its shortest ASCII form. + +The Berkeley DB `transactions' table records both unfinished and +committed transactions. Every key in this table is a transaction ID. +Unfinished transactions have values that are skels of one of the +following forms: + + ("transaction" ROOT-ID BASE-ID PROPLIST COPIES) + ("dead" ROOT-ID BASE-ID PROPLIST COPIES) + +where: + + * ROOT-ID is the node revision ID of the transaction's root + directory. This is of the form 0.0.THIS-TXN-ID. + + * BASE-ID is the node revision ID of the root of the transaction's + base revision. This is of the form 0.0.BASE-TXN-ID - the base + transaction is, of course, the transaction of the base revision. + + * PROPLIST is a skel giving the revision properties for the + transaction. + + * COPIES contains a list of keys into the `copies' table, + referencing all the filesystem copies created inside of this + transaction. If the transaction is aborted, these copies get + removed from the `copies' table. + + * A "dead" transaction is one that has been requested to be + destroyed, and should never, ever, be committed. + +Committed transaction, however, have values that are skels of the form: + + ("committed" ROOT-ID REV PROPLIST COPIES) + +where: + + * ROOT-ID is the node revision ID of the committed transaction's (or + revision's) root node. + + * REV represents the revision that was created when the + transaction was committed. + + * PROPLIST is a skel giving the revision properties for the + committed transaction. + + * COPIES contains a list of keys into the `copies' table, + referencing all the filesystem copies created by this committed + transaction. Nothing currently uses this information for + committed transactions, but it could be useful in the future. + +As the sole exception to the rule above, the `transactions' table +always has one entry whose key is `next-key', and whose value is the +lowest transaction ID that has never yet been used. We use this entry +to allocate ID's for new transactions. + +The `transactions' table is a btree, with no particular sort order. + + + +Changes + +As modifications are made (files and dirs added or removed, text and +properties changed, etc.) on Subversion transaction trees, the +filesystem tracks the basic change made in the Berkeley DB `changes' +table. + +The `changes' table is a btree with Berkeley's "duplicate keys" +functionality (and with no particular sort order), and maps the +one-to-many relationship of a transaction ID to a "change" item. +Change items are skels of the form: + + ("change" PATH ID CHANGE-KIND TEXT-MOD PROP-MOD) + +where: + + * PATH is the path that was operated on to enact this change. + + * ID is the node revision ID of the node changed. The precise + meaning varies based on the kind of the change: + - "add" or "modify": a new node revision created in the current + txn. + - "delete": a node revision from a previous txn. + - "replace": a replace operation actually acts on two node + revisions, one being deleted, one being added. Only the added + node-revision ID is recorded in the `changes' table - this is + a design flaw. + - "reset": no node revision applies. A zero atom is used as a + placeholder. + + * CHANGE-KIND is one of the following: + + - "add" : PATH/ID was added to the filesystem. + - "delete" : PATH/ID was removed from the filesystem. + - "replace" : PATH/ID was removed, then re-added to the filesystem. + - "modify" : PATH/ID was otherwise modified. + - "reset" : Ignore any previous changes for PATH/ID in this txn. + This kind is no longer created by Subversion 1.3.0 + and later, and can probably be removed at the next + schema bump. + + * TEXT-MOD is a bit specifying whether or not the contents of + this node was modified. + + * PROP-MOD is a bit specifying whether or not the properties of + this node where modified. + +In order to fully describe the changes made to any given path as part +of a single transaction, one must read all the change items associated +with the transaction's ID, and "collapse" multiple entries that refer +to that path. + + + +Copies + +Each time a filesystem copy operation is performed, Subversion records +meta-data about that copy. + +Copies are identified by unique numbers called copy ID's. Currently, +copy ID's are never reused, though this is not mandated by the schema. +In the database, we always represent a copy ID in its shortest ASCII +form. + +The Berkeley DB `copies' table records all filesystem copies. Every +key in this table is copy ID, and every value is a skel of one of the +following forms: + + ("copy" SRC-PATH SRC-TXN DST-NODE-ID) + ("soft-copy" SRC-PATH SRC-TXN DST-NODE-ID) + +where: + + * "copy" indicates an explicitly requested copy, and "soft-copy" + indicates a node that was cloned internally as part of an + explicitly requested copy of some parent directory. See the + section "Copies and Copy IDs" in the file <fs-history> for + details. + + * SRC-PATH and SRC-TXN are the canonicalized absolute path and + transaction ID, respectively, of the source of the copy. + + * DST-NODE-ID represents the new node revision created as a result + of the copy. + +As the sole exception to the rule above, the `copies' table always has +one entry whose key is `next-key', and whose value is the lowest copy ID +that has never yet been used. We use this entry to allocate new +copy ID's. + +The `copies' table is a btree, with no particular sort order. + + + +Locks + +When a caller locks a file -- reserving an exclusive right to modify +or delete it -- an lock object is created in a `locks' table. + +The `locks' table is a btree whose key is a UUID string known as +a "lock-token", and whose value is a skel representing a lock. The +fields in the skel mirror those of an svn_lock__t (see svn_types.h): + + ("lock" PATH TOKEN OWNER COMMENT XML-P CREATION-DATE EXPIRATION-DATE) + +where: + + * PATH is the absolute filesystem path reserved by the lock. + + * TOKEN is the universally unique identifier of the lock, known + as the lock-token. This is the same as the row's key. + + * OWNER is the authenticated username that "owns" the lock. + + * COMMENT is a string describing the lock. It may be empty, or it + might describe the rationale for locking. + + * XML-P is a boolean (either 0 or 1) indicating whether the COMMENT + field is wrapped in an XML tag. (This is something only used by + the DAV layer, for webdav interoperabliity.) + + * CREATION-DATE is a string representation of the date/time when + the lock was created. (see svn_time_to_cstring()) + + * EXPIRATION-DATE is a string representation of the date/time when + the lock will cease to be valid. (see svn_time_to_cstring()) + +In addition to creating a lock in the `locks' table, a new row is +created in a `lock-tokens' table. The `lock-tokens' table is a btree +whose key is an absolute path in the filesystem. The value of each +key is a lock-token (which is a key into the `locks' table.) + +To test if a path is locked, simply check if the path is a key in the +`lock-tokens' table. To see if a certain directory has any locked +children below, we ask BerkeleyDB to do a "greater or equal match" on +the directory path, and see if any results come back. If they do, +then at least one of the directory's children is locked, and thus the +directory cannot be deleted without further investigation. + +Locks are ephemeral things, not historied in any way. They are +potentially created and deleted quite often. When a lock is +destroyed, the appropriate row is removed from the `locks' table. +Additionally, the locked-path is removed from the `lock-tokens' table. + + + + + +Merge rules + +The Subversion filesystem must provide the following characteristics: + +- clients can submit arbitrary rearrangements of the tree, to be + performed as atomic changes to the filesystem tree +- multiple clients can submit non-overlapping changes at the same time, + without blocking +- readers must never block other readers or writers +- writers must never block readers +- writers may block writers + +Merging rules: + + The general principle: a series of changes can be merged iff the + final outcome is independent of the order you apply them in. + +Merging algorithm: + + For each entry NAME in the directory ANCESTOR: + + Let ANCESTOR-ENTRY, SOURCE-ENTRY, and TARGET-ENTRY be the IDs of + the name within ANCESTOR, SOURCE, and TARGET respectively. + (Possibly null if NAME does not exist in SOURCE or TARGET.) + + If ANCESTOR-ENTRY == SOURCE-ENTRY, then: + No changes were made to this entry while the transaction was in + progress, so do nothing to the target. + + Else if ANCESTOR-ENTRY == TARGET-ENTRY, then: + A change was made to this entry while the transaction was in + process, but the transaction did not touch this entry. Replace + TARGET-ENTRY with SOURCE-ENTRY. + + Else: + Changes were made to this entry both within the transaction and + to the repository while the transaction was in progress. They + must be merged or declared to be in conflict. + + If SOURCE-ENTRY and TARGET-ENTRY are both null, that's a + double delete; if one of them is null, that's a delete versus + a modification. In any of these cases, flag a conflict. + + If any of the three entries is of type file, declare a conflict. + + If either SOURCE-ENTRY or TARGET-ENTRY is not a direct + modification of ANCESTOR-ENTRY (determine by comparing the + node-id fields), declare a conflict. A replacement is + incompatible with a modification or other replacement--even + an identical replacement. + + Direct modifications were made to the directory ANCESTOR-ENTRY + in both SOURCE and TARGET. Recursively merge these + modifications. + + For each leftover entry NAME in the directory SOURCE: + + If NAME exists in TARGET, declare a conflict. Even if SOURCE and + TARGET are adding exactly the same thing, two additions are not + auto-mergeable with each other. + + Add NAME to TARGET with the entry from SOURCE. + + Now that we are done merging the changes from SOURCE into the + directory TARGET, update TARGET's predecessor to be SOURCE. + +The following algorithm was used when the Subversion filesystem was +initially written, but has been replaced with the simpler and more +performant algorithm above: + + Merging two nodes, A and B, with respect to a common ancestor + ANCESTOR: + + - First, the merge fails unless A, B, and ANCESTOR are all the same + kind of node. + - If A and B are text files: + - If A is an ancestor of B, then B is the merged result. + - If A is identical to B, then B (arbitrarily) is the merged + result. + - Otherwise, the merge fails. + - If A and B are both directories: + - For every directory entry E in either A, B, or ANCESTOR, here + are the cases: + - E exists in neither ANCESTOR nor A. + - E doesn't exist in ANCESTOR, and has been added to A. + - E exists in ANCESTOR, but has been deleted from A. + - E exists in both ANCESTOR and A ... + - but refers to different nodes. + - but refers to different revisions of the same node. + - and refers to the same node revision. + + The same set of possible relationships with ANCESTOR holds for B, + so there are thirty-six combinations. The matrix is symmetrical + with A and B reversed, so we only have to describe one triangular + half, including the diagonal --- 21 combinations. + + - (6) E exists in neither ANCESTOR nor A: + - (1) E exists in neither ANCESTOR nor B. Can't occur, by + assumption that E exists in either A, B, or ancestor. + - (1) E has been added to B. Add E in the merged result. *** + - (1) E has been deleted from B. Can't occur, by assumption + that E doesn't exist in ANCESTOR. + - (3) E exists in both ANCESTOR and B. Can't occur, by + assumption that E doesn't exist in ancestor. + - (5) E doesn't exist in ANCESTOR, and has been added to A. + - (1) E doesn't exist in ANCESTOR, and has been added to B. + Conflict. + - (1) E exists in ANCESTOR, but has been deleted from B. + Can't occur, by assumption that E doesn't exist in + ANCESTOR. + - (3) E exists in both ANCESTOR and B. Can't occur, by + assumption that E doesn't exist in ANCESTOR. + - (4) E exists in ANCESTOR, but has been deleted from A. + - (1) E exists in ANCESTOR, but has been deleted from B. If + neither delete was a result of a rename, then omit E from + the merged tree. *** Otherwise, conflict. + - E exists in both ANCESTOR and B ... + - (1) but refers to different nodes. Conflict. + - (1) but refers to different revisions of the same node. + Conflict. + - (1) and refers to the same node revision. Omit E from + the merged tree. *** + - (3) E exists in both ANCESTOR and A, but refers to different + nodes. + - (1) E exists in both ANCESTOR and B, but refers to + different nodes. Conflict. + - (1) E exists in both ANCESTOR and B, but refers to + different revisions of the same node. Conflict. + - (1) E exists in both ANCESTOR and B, and refers to the same + node revision. Replace E with A's node revision. *** + - (2) E exists in both ANCESTOR and A, but refers to different + revisions of the same node. + - (1) E exists in both ANCESTOR and B, but refers to + different revisions of the same node. Try to merge A/E and + B/E, recursively. *** + - (1) E exists in both ANCESTOR and B, and refers to the same + node revision. Replace E with A's node revision. *** + - (1) E exists in both ANCESTOR and A, and refers to the same + node revision. + - (1) E exists in both ANCESTOR and B, and refers to the same + node revision. Nothing has happened to ANCESTOR/E, so no + change is necessary. + + *** == something actually happens + + +Non-Historical Properties + +[[Yes, do tell.]] + + +UUIDs: Universally Unique Identifiers + +Every filesystem has a UUID. This is represented as record #1 in the +`uuids' table. + + +Layers + +In previous structurings of the code, I had trouble keeping track of +exactly who has implemented which promises, based on which other +promises from whom. + +I hope the arrangement below will help me keep things straight, and +make the code more reliable. The files are arranged in order from +low-level to high-level: each file depends only on services provided +by the files before it. + +skel.c, id.c, dbt.c, convert-size.c + + Low-level utility functions. + +fs_skels.c Routines for marshaling between skels and native FS types. + +fs.c Creating and destroying filesystem objects. + +err.c Error handling. + +nodes-table.c, txn-table.c, rev-table.c, reps-table.c, strings-table.c + + Create and open particular database tables. + Responsible for intra-record consistency. + +node-rev.c Creating, reading, and writing node revisions. + Responsible for deciding what gets deltified when. + +reps-strings.c + Retrieval and storage of represented strings. + This will handle delta-based storage, + +dag.c Operations on the DAG filesystem. "DAG" because the + interface exposes the filesystem's sharing structure. + Enforce inter-record consistency. + +tree.c Operations on the tree filesystem. This layer is + built on top of dag.c, but transparently distinguishes + virtual copies, making the underlying DAG look like a + real tree. This makes incomplete transactions behave + like ordinary mutable filesystems. + +delta.c Computing deltas. + + + +Appendix: Filesystem structure summary +====================================== + +Berkeley DB tables +------------------ + + "nodes" : btree(ID -> NODE-REVISION, "next-key" -> NODE-ID) + "revisions" : recno(REVISION) + "transactions" : btree(TXN -> TRANSACTION, "next-key" -> TXN) + "changes" : btree(TXN -> CHANGE) + "copies" : btree(CPY -> COPY, "next-key" -> CPY) + "strings" : btree(STR -> STRING, "next-key" -> STR) + "representations" : btree(REP -> REPRESENTATION, "next-key" -> REP) + "uuids" : recno(UUID) + "locks" : btree(TOKEN -> LOCK) + "lock-tokens" : btree(PATH -> TOKEN) + "node-origins" : btree(NODE-ID -> ID) + "checksum-reps" : btree(SHA1SUM -> REP, "next-key" -> number-36) + "miscellaneous" : btree(STRING -> STRING) + +Syntactic elements +------------------ + +Table keys: + + ID ::= NODE-REV-ID ; + TXN ::= number-36 ; + CPY ::= number-36 ; + STR ::= number-36 ; + REP ::= number-36 ; + TOKEN ::= uuid ; + +Property lists: + + PROPLIST ::= (PROP ...) ; + PROP ::= atom atom ; + + +Filesystem revisions: + + REVISION ::= ("revision" TXN) ; + + +Transactions: + + TRANSACTION ::= UNFINISHED-TXN | COMMITTED-TXN | DEAD-TXN + UNFINISHED-TXN ::= ("transaction" ROOT-ID BASE-ID PROPLIST COPIES) ; + COMMITTED-TXN ::= ("committed" ROOT-ID REV PROPLIST COPIES) ; + DEAD-TXN ::= ("dead" ROOT-ID BASE-ID PROPLIST COPIES) ; + ROOT-ID ::= NODE-REV-ID ; + BASE-ID ::= NODE-REV-ID ; + COPIES ::= (CPY ...) ; + REV ::= number ; + + +Changes: + + CHANGE ::= ("change" PATH ID CHANGE-KIND TEXT-MOD PROP-MOD) ; + CHANGE-KIND ::= "add" | "delete" | "replace" | "modify" | "reset"; + TEXT-MOD ::= atom ; + PROP-MOD ::= atom ; + + +Copies: + + COPY ::= REAL-COPY | SOFT-COPY + REAL-COPY ::= ("copy" SRC-PATH SRC-TXN DST-NODE-ID) + SOFT-COPY ::= ("soft-copy" SRC-PATH SRC-TXN DST-NODE-ID) + SRC-PATH ::= atom ; + SRC-TXN ::= TXN ; + DST-NODE-ID ::= NODE-REV-ID ; + + +Entries lists: + + ENTRIES ::= (ENTRY ...) ; + ENTRY ::= (NAME ID) ; + NAME ::= atom ; + + +Node revisions: + + NODE-REVISION ::= FILE | DIR ; + FILE ::= (HEADER PROP-KEY DATA-INFO [EDIT-DATA-KEY]) ; + DIR ::= (HEADER PROP-KEY ENTRIES-KEY) ; + HEADER ::= (KIND CREATED-PATH + [PRED-ID [PRED-COUNT + [HAS-MERGEINFO MERGEINFO-COUNT]]]) ; + KIND ::= "file" | "dir" ; + PRED-ID ::= NODE-REV-ID | ""; + PRED-COUNT ::= number | "" ; + CREATED-PATH ::= atom ; + PROP-KEY ::= atom ; + DATA-INFO ::= DATA-KEY | (DATA-KEY DATA-KEY-UNIQID) + DATA-KEY ::= atom ; + DATA-KEY-UNIQID ::= atom ; + EDIT-DATA-KEY ::= atom ; + HAS-MERGEINFO ::= "0" | "1" ; + MERGEINFO-COUNT ::= number ; + + +Representations: + + REPRESENTATION ::= FULLTEXT | DELTA ; + FULLTEXT ::= (HEADER STRING-KEY) ; + DELTA ::= (HEADER (OFFSET WINDOW) ...) ; + WINDOW ::= (DIFF SIZE REP-KEY [REP-OFFSET]) ; + DIFF ::= ("svndiff" VERSION STRING-KEY) ; + VERSION ::= number ; + REP-KEY ::= atom ; + STRING-KEY ::= atom ; + OFFSET ::= number ; + REP-OFFSET ::= number ; + + HEADER ::= (KIND TXN [MD5 [SHA1]]) ; + KIND ::= "fulltext" | "delta" ; + + SIZE ::= number ; + MD5 ::= ("md5" MD5SUM) ; + SHA1 ::= ("sha1" SHA1SUM) ; + MD5SUM ::= atom ; + SHA1SUM ::= atom ; + + +Strings: + + STRING ::= RAWTEXT | LISTTEXT | DIFFTEXT + RAWTEXT ::= /{anything.class}*/ ; + LISTTEXT ::= list ; + DIFFTEXT ::= /{anything.class}*/ ; + + +Node revision IDs: + + NODE-REV-ID ::= NODE-ID '.' CPY '.' TXN ; + NODE-ID ::= number ; + +UUIDs: + UUID ::= uuid ; + + +Locks: + + LOCK ::= ("lock" PATH TOKEN OWNER + COMMENT XML-P CR-DATE [X-DATE]); + PATH ::= atom ; + OWNER ::= atom ; + COMMENT ::= atom ; + XML-P ::= "0" | "1" ; + CR-DATE ::= atom ; + X-DATE ::= atom ; + +Lock tokens: + + (the value is just a lock-token, which is a uuid) + + +Node origins: + + NODE-ID ::= NODE-REV-ID ; + + +Lexical elements +---------------- + +UUIDs: + + uuid ::= hexits-32 '-' hexits-16 '-' hexits-16 '-' + hexits-16 '-' hexits-48 ; + +Numbers: + + number ::= /{digit.class}+/ ; + number-36 ::= /{base36.class}+/ ; + hexits-32 ::= /{base16.class}{8}/ ; + hexits-16 ::= /{base16.class}{4}/ ; + hexits-48 ::= /{base16.class}{12}/ ; + +(Note: the following are described in skel.h) +Skels: + + skel ::= atom | list; + list ::= list.head list.body.opt list.tail ; + atom ::= atom.imp-len | atom.exp-len ; + + list.head ::= '(' spaces.opt ; + list.tail ::= spaces.opt ')' ; + list.body.opt ::= | list.body ; + list.body ::= skel | list.body spaces.opt skel ; + + atom.imp-len ::= /{name.class}[^\(\){ws.class}]*/ ; + atom.exp-len ::= /({digit.class}+){ws.class}.{\1}/ ; + + spaces.opt ::= /{ws.class}*/ ; + + +Character classes: + + ws.class ::= [\t\n\f\r\ ] ; + digit.class ::= [0-9] ; + name.class ::= [A-Za-z] ; + base16.class ::= [0-9a-f] + base36.class ::= [a-z0-9] + anything.class ::= anything at all ; + + + +Appendix: 'miscellaneous' table contents +====================================== + +The 'miscellaneous' table contains string keys mapped to string +values. Here is a table of the supported keys, the descriptions of +their values, and the filesystem format version in which they were +introduced. + + Fmt Key Value + --- ------------------ ------------------------------------ + 4 forward-delta-rev Youngest revision in the repository as of + the moment when it was upgraded to support + forward deltas. |