diff options
Diffstat (limited to 'subversion/libsvn_fs_base/notes/TODO')
-rw-r--r-- | subversion/libsvn_fs_base/notes/TODO | 137 |
1 files changed, 137 insertions, 0 deletions
diff --git a/subversion/libsvn_fs_base/notes/TODO b/subversion/libsvn_fs_base/notes/TODO new file mode 100644 index 0000000..c72af03 --- /dev/null +++ b/subversion/libsvn_fs_base/notes/TODO @@ -0,0 +1,137 @@ +What's happening now + +The filesystem needs some path validation stuffs independent of the +SVN path utilities. A filesystem path is a well-defined Thing that +should be held a safe distance away from future changes to SVN's +general path library. + + +Incorrectnesses + +We must ensure that node numbers are never reused. If we open a node, +svn_fs_delete it, and then create new nodes, what happens when the +original node structure suddenly comes to refer to an entirely +different node? Files become directories? + +We should convert filenames to some canonical Unicode form, for +comparison. + +Does everyone call svn_fs__check_fs who should? + +svn_fs_delete will actually delete non-empty directories, if they're +not cloned. This is inconsistent; should it be fixed? + +Does every operation on a deleted node or completed transaction fail +gracefully? + +Produce helpful error messages when filename paths contain null +characters. + + +Uglinesses + +Fix up comments in svn_fs.h for transactions. + +Add `public name' member to filesystem structure, to use to identify +the filesystem in error messages. When driven by DAV, this could be a +URL. + +When a dag function signals an error, it has no idea what the path of +the relevant node was. But node revision ID's are pretty useless to +the user. tree.c should probably rewrap some errors. + +svn_fs__getsize shouldn't rely on a maximum value for detecting +overflow. + +The use of svn_fs__getsize in svn_fs__parse_id is ugly --- what if +svn_vernum_t and apr_size_t aren't the same size? + +Consider some macros or accessory functions for referencing the pieces +of the NODE-REVISION skel (instead of seeing stuff like +node->children->next->next and such other unreadable rubbish) + + +Slownesses + +We don't store older node revisions as deltas yet. + +The delta algorithm walks the whole tree using a single pool, so the +memory used is proportional to the size of the target tree. Instead, +it should use a separate subpool every time it recurses into a new +directory, and free that subpool as soon as it's done processing that +subdirectory, so the memory used is proportional to the depth of the +tree. + +We should move as much real content out of the NODE-REVISION skel as +possible; the skels should be holding only small stuff (node kind, +flags). +- File contents and deltas should be moved out to a `contents' table. + The NODE-REVISION skel should simply contain a key into that table. +- Directory contents should be moved out to a `directories' table, + with a separate table entry for each directory entry. Keys into the + table should be of the form `NODE-ID ENTRY-NAME NODE-REVISION', and + values should be node revision ID's, or the word `deleted'; to look + up an entry named E in a directory whose node revision is N.R, + search for the entry `N E x', where x is the largest number present + <= R. +- Property lists should be moved out to a table `properties', indexed + similarly to the above. We could deltify property contents the + same way we do file contents. + + +Amenities + +Extend svn_fs_copy to handle mutable nodes. + +Long term ideas: + +- directory entry cache: + Create a cache mapping a node revision id X plus a filename component + N onto a new node revision id Y, meaning that X is a directory in + which the name N is bound to ID Y. If everything were in the cache, + this function could run with no I/O except for the final node. + + Since node revisions never change, we wouldn't have to worry about + invalidating the cache. Mutable node objects will need special + handling, of course. + +- fulltext cache: + If we've recently computed a node's fulltext, we might want to keep + that around in case we need to compute one of its nearby ancestors' + fulltext, too. This could be a waste, though --- the access + patterns are a mix of linear scan (backwards to reconstruct a given + revision) and random (who knows what node we'll hit next), so it's + not clear what cache policy would be effective. Best to record some + data on how many delta applications a given cache would avoid before + implementing it. + +- delta cache: + As people update, we're going to be recomputing text deltas for the + most recently changed files pretty often. It might be worthwhile to + cache the deltas for a little while. + +- Handle Unicode canonicalization for directory and property names + ourselves. People should be able to hand us any valid UTF-8 + sequence, perhaps with precomposed characters or non-spacing marks + in a non-canonical order, and find the appropriate matches, given + the rules defined by the Unicode standard. + +Keeping repositories alive in the long term: Berkeley DB is infamous +for changing its file format from one revision to the next. If someone +saves a Subversion 1.0 repository on a CD somewhere, and then tries to +read it seven years later, their chance of being able to read it with +the latest revision of Subversion is nil. The solution: + +- Define a simply XML repository dump format for the complete + repository data. This should be the same format we use for CVS + repository conversion. We'll have an import function. + +- Write a program that is simple and self-contained --- does not use + Berkeley DB, no fancy XML tools, uses nothing but POSIX read and + seek --- that can dump a Subversion repository in that format. + +- For each revision of Subversion, make a sample repository, and + archive a copy of it away as test data. + +- Write a test suite that verifies that the repository dump program + can handle all of the archived formats. |