summaryrefslogtreecommitdiffstats
path: root/subversion/libsvn_fs_base/notes/TODO
diff options
context:
space:
mode:
Diffstat (limited to 'subversion/libsvn_fs_base/notes/TODO')
-rw-r--r--subversion/libsvn_fs_base/notes/TODO137
1 files changed, 137 insertions, 0 deletions
diff --git a/subversion/libsvn_fs_base/notes/TODO b/subversion/libsvn_fs_base/notes/TODO
new file mode 100644
index 0000000..c72af03
--- /dev/null
+++ b/subversion/libsvn_fs_base/notes/TODO
@@ -0,0 +1,137 @@
+What's happening now
+
+The filesystem needs some path validation stuffs independent of the
+SVN path utilities. A filesystem path is a well-defined Thing that
+should be held a safe distance away from future changes to SVN's
+general path library.
+
+
+Incorrectnesses
+
+We must ensure that node numbers are never reused. If we open a node,
+svn_fs_delete it, and then create new nodes, what happens when the
+original node structure suddenly comes to refer to an entirely
+different node? Files become directories?
+
+We should convert filenames to some canonical Unicode form, for
+comparison.
+
+Does everyone call svn_fs__check_fs who should?
+
+svn_fs_delete will actually delete non-empty directories, if they're
+not cloned. This is inconsistent; should it be fixed?
+
+Does every operation on a deleted node or completed transaction fail
+gracefully?
+
+Produce helpful error messages when filename paths contain null
+characters.
+
+
+Uglinesses
+
+Fix up comments in svn_fs.h for transactions.
+
+Add `public name' member to filesystem structure, to use to identify
+the filesystem in error messages. When driven by DAV, this could be a
+URL.
+
+When a dag function signals an error, it has no idea what the path of
+the relevant node was. But node revision ID's are pretty useless to
+the user. tree.c should probably rewrap some errors.
+
+svn_fs__getsize shouldn't rely on a maximum value for detecting
+overflow.
+
+The use of svn_fs__getsize in svn_fs__parse_id is ugly --- what if
+svn_vernum_t and apr_size_t aren't the same size?
+
+Consider some macros or accessory functions for referencing the pieces
+of the NODE-REVISION skel (instead of seeing stuff like
+node->children->next->next and such other unreadable rubbish)
+
+
+Slownesses
+
+We don't store older node revisions as deltas yet.
+
+The delta algorithm walks the whole tree using a single pool, so the
+memory used is proportional to the size of the target tree. Instead,
+it should use a separate subpool every time it recurses into a new
+directory, and free that subpool as soon as it's done processing that
+subdirectory, so the memory used is proportional to the depth of the
+tree.
+
+We should move as much real content out of the NODE-REVISION skel as
+possible; the skels should be holding only small stuff (node kind,
+flags).
+- File contents and deltas should be moved out to a `contents' table.
+ The NODE-REVISION skel should simply contain a key into that table.
+- Directory contents should be moved out to a `directories' table,
+ with a separate table entry for each directory entry. Keys into the
+ table should be of the form `NODE-ID ENTRY-NAME NODE-REVISION', and
+ values should be node revision ID's, or the word `deleted'; to look
+ up an entry named E in a directory whose node revision is N.R,
+ search for the entry `N E x', where x is the largest number present
+ <= R.
+- Property lists should be moved out to a table `properties', indexed
+ similarly to the above. We could deltify property contents the
+ same way we do file contents.
+
+
+Amenities
+
+Extend svn_fs_copy to handle mutable nodes.
+
+Long term ideas:
+
+- directory entry cache:
+ Create a cache mapping a node revision id X plus a filename component
+ N onto a new node revision id Y, meaning that X is a directory in
+ which the name N is bound to ID Y. If everything were in the cache,
+ this function could run with no I/O except for the final node.
+
+ Since node revisions never change, we wouldn't have to worry about
+ invalidating the cache. Mutable node objects will need special
+ handling, of course.
+
+- fulltext cache:
+ If we've recently computed a node's fulltext, we might want to keep
+ that around in case we need to compute one of its nearby ancestors'
+ fulltext, too. This could be a waste, though --- the access
+ patterns are a mix of linear scan (backwards to reconstruct a given
+ revision) and random (who knows what node we'll hit next), so it's
+ not clear what cache policy would be effective. Best to record some
+ data on how many delta applications a given cache would avoid before
+ implementing it.
+
+- delta cache:
+ As people update, we're going to be recomputing text deltas for the
+ most recently changed files pretty often. It might be worthwhile to
+ cache the deltas for a little while.
+
+- Handle Unicode canonicalization for directory and property names
+ ourselves. People should be able to hand us any valid UTF-8
+ sequence, perhaps with precomposed characters or non-spacing marks
+ in a non-canonical order, and find the appropriate matches, given
+ the rules defined by the Unicode standard.
+
+Keeping repositories alive in the long term: Berkeley DB is infamous
+for changing its file format from one revision to the next. If someone
+saves a Subversion 1.0 repository on a CD somewhere, and then tries to
+read it seven years later, their chance of being able to read it with
+the latest revision of Subversion is nil. The solution:
+
+- Define a simply XML repository dump format for the complete
+ repository data. This should be the same format we use for CVS
+ repository conversion. We'll have an import function.
+
+- Write a program that is simple and self-contained --- does not use
+ Berkeley DB, no fancy XML tools, uses nothing but POSIX read and
+ seek --- that can dump a Subversion repository in that format.
+
+- For each revision of Subversion, make a sample repository, and
+ archive a copy of it away as test data.
+
+- Write a test suite that verifies that the repository dump program
+ can handle all of the archived formats.
OpenPOWER on IntegriCloud