diff options
Diffstat (limited to 'lib/libc/locale/DESIGN.xlocale')
-rw-r--r-- | lib/libc/locale/DESIGN.xlocale | 159 |
1 files changed, 159 insertions, 0 deletions
diff --git a/lib/libc/locale/DESIGN.xlocale b/lib/libc/locale/DESIGN.xlocale new file mode 100644 index 0000000..5d998d3 --- /dev/null +++ b/lib/libc/locale/DESIGN.xlocale @@ -0,0 +1,159 @@ +$FreeBSD$ + +Design of xlocale +================= + +The xlocale APIs come from Darwin, although a subset is now part of POSIX 2008. +They fall into two broad categories: + +- Manipulation of per-thread locales (POSIX) +- Locale-aware functions taking an explicit locale argument (Darwin) + +This document describes the implementation of these APIs for FreeBSD. + +Goals +----- + +The overall goal of this implementation is to be compatible with the Darwin +version. Additionally, it should include minimal changes to the existing +locale code. A lot of the existing locale code originates with 4BSD or earlier +and has had over a decade of testing. Replacing this code, unless absolutely +necessary, gives us the potential for more bugs without much benefit. + +With this in mind, various libc-private functions have been modified to take a +locale_t parameter. This causes a compiler error if they are accidentally +called without a locale. This approach was taken, rather than adding _l +variants of these functions, to make it harder for accidental uses of the +global-locale versions to slip in. + +Locale Objects +-------------- + +A locale is encapsulated in a `locale_t`, which is an opaque type: a pointer to +a `struct _xlocale`. The name `_xlocale` is unfortunate, as it does not fit +well with existing conventions, but is used because this is the name the Darwin +implementation gives to this structure and so may be used by existing (bad) code. + +This structure should include all of the information corresponding to a locale. +A locale_t is almost immutable after creation. There are no functions that modify it, +and it can therefore be used without locking. It is the responsibility of the +caller to ensure that a locale is not deallocated during a call that uses it. + +Each locale contains a number of components, one for each of the categories +supported by `setlocale()`. These are likewise immutable after creation. This +differs from the Darwin implementation, which includes a deprecated +`setinvalidrune()` function that can modify the rune locale. + +The exception to these mutability rules is a set of `mbstate_t` flags stored +with each locale. These are used by various functions that previously had a +static local `mbstate_t` variable. + +The components are reference counted, and so can be aliased between locale +objects. This makes copying locales very cheap. + +The Global Locale +----------------- + +All locales and locale components are reference counted. The global locale, +however, is special. It, and all of its components, are static and so no +malloc() memory is required when using a single locale. + +This means that threads using the global locale are subject to the same +constraints as with the pre-xlocale libc. Calls to any locale-aware functions +in threads using the global locale, while modifying the global locale, have +undefined behaviour. + +Because of this, we have to ensure that we always copy the components of the +global locale, rather than alias them. + +It would be cleaner to simply remove the special treatment of the global locale +and have a locale_t lazily allocated for the global context. This would cost a +little more `malloc()` memory, so is not done in the initial version. + +Caching +------- + +The existing locale implementation included several ad-hoc caching layers. +None of these were thread safe. Caching is only really of use for supporting +the pattern where the locale is briefly changed to something and then changed +back. + +The current xlocale implementation removes the caching entirely. This pattern +is not one that should be encouraged. If you need to make some calls with a +modified locale, then you should use the _l suffix versions of the calls, +rather than switch the global locale. If you do need to temporarily switch the +locale and then switch it back, `uselocale()` provides a way of doing this very +easily: It returns the old locale, which can then be passed to a subsequent +call to `uselocale()` to restore it, without the need to load any locale data +from the disk. + +If, in the future, it is determined that caching is beneficial, it can be added +quite easily in xlocale.c. Given, however, that any locale-aware call is going +to be a preparation for presenting data to the user, and so is invariably going +to be part of an I/O operation, this seems like a case of premature +optimisation. + +localeconv +---------- + +The `localeconv()` function is an exception to the immutable-after-creation +rule. In the classic implementation, this function returns a pointer to some +global storage, which is initialised with the data from the current locale. +This is not possible in a multithreaded environment, with multiple locales. + +Instead, each locale contains a `struct lconv` that is lazily initialised on +calls to `localeconv()`. This is not protected by any locking, however this is +still safe on any machine where word-sized stores are atomic: two concurrent +calls will write the same data into the structure. + +Explicit Locale Calls +--------------------- + +A large number of functions have been modified to take an explicit `locale_t` +parameter. The old APIs are then reimplemented with a call to `__get_locale()` +to supply the `locale_t` parameter. This is in line with the Darwin public +APIs, but also simplifies the modifications to these functions. The +`__get_locale()` function is now the only way to access the current locale +within libc. All of the old globals have gone, so there is now a linker error +if any functions attempt to use them. + +The ctype.h functions are a little different. These are not implemented in +terms of their locale-aware versions, for performance reasons. Each of these +is implemented as a short inline function. + +Differences to Darwin APIs +-------------------------- + +`strtoq_l()` and `strtouq_l() `are not provided. These are extensions to +deprecated functions - we should not be encouraging people to use deprecated +interfaces. + +Locale Placeholders +------------------- + +The pointer values 0 and -1 have special meanings as `locale_t` values. Any +public function that accepts a `locale_t` parameter must use the `FIX_LOCALE()` +macro on it before using it. For efficiency, this can be emitted in functions +which *only* use their locale parameter as an argument to another public +function, as the callee will do the `FIX_LOCALE()` itself. + +Potential Improvements +---------------------- + +Currently, the current rune set is accessed via a function call. This makes it +fairly expensive to use any of the ctype.h functions. We could improve this +quite a lot by storing the rune locale data in a __thread-qualified variable. + +Several of the existing FreeBSD locale-aware functions appear to be wrong. For +example, most of the `strto*()` family should probably use `digittoint_l()`, +but instead they assume ASCII. These will break if using a character encoding +that does not put numbers and the letters A-F in the same location as ASCII. +Some functions, like `strcoll()` only work on single-byte encodings. No +attempt has been made to fix existing limitations in the libc functions other +than to add support for xlocale. + +Intuitively, setting a thread-local locale should ensure that all locale-aware +functions can be used safely from that thread. In fact, this is not the case +in either this implementation or the Darwin one. You must call `duplocale()` +or `newlocale()` before calling `uselocale()`. This is a bit ugly, and it +would be better if libc ensure that every thread had its own locale object. |