summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorphk <phk@FreeBSD.org>1996-04-13 08:30:21 +0000
committerphk <phk@FreeBSD.org>1996-04-13 08:30:21 +0000
commit26030597389c8d6a6c8c1f0c7601fd9bdf0584f5 (patch)
tree1fd09c21d5269ca254cb617520ccaf0b1cff2e86
parent71735d6644bce690478b4cd10efed8a56497202d (diff)
downloadFreeBSD-src-26030597389c8d6a6c8c1f0c7601fd9bdf0584f5.zip
FreeBSD-src-26030597389c8d6a6c8c1f0c7601fd9bdf0584f5.tar.gz
A little paper about phkmalloc.
-rw-r--r--share/doc/papers/Makefile4
-rw-r--r--share/doc/papers/malloc/Makefile13
-rw-r--r--share/doc/papers/malloc/abs.ms35
-rw-r--r--share/doc/papers/malloc/alternatives.ms45
-rw-r--r--share/doc/papers/malloc/conclusion.ms48
-rw-r--r--share/doc/papers/malloc/implementation.ms222
-rw-r--r--share/doc/papers/malloc/intro.ms74
-rw-r--r--share/doc/papers/malloc/kernel.ms56
-rw-r--r--share/doc/papers/malloc/malloc.ms72
-rw-r--r--share/doc/papers/malloc/performance.ms113
-rw-r--r--share/doc/papers/malloc/problems.ms54
11 files changed, 734 insertions, 2 deletions
diff --git a/share/doc/papers/Makefile b/share/doc/papers/Makefile
index cd10886..8eec538 100644
--- a/share/doc/papers/Makefile
+++ b/share/doc/papers/Makefile
@@ -1,6 +1,6 @@
-# $Id: Makefile,v 1.1 1995/01/11 04:09:56 wollman Exp $
+# $Id: Makefile,v 1.2 1995/01/11 19:51:41 wollman Exp $
-SUBDIR= beyond4.3 diskperf fsinterface kernmalloc kerntune memfs \
+SUBDIR= beyond4.3 diskperf fsinterface kernmalloc kerntune malloc memfs \
newvm nqnfs px relengr sysperf
.include <bsd.subdir.mk>
diff --git a/share/doc/papers/malloc/Makefile b/share/doc/papers/malloc/Makefile
new file mode 100644
index 0000000..01d8a19
--- /dev/null
+++ b/share/doc/papers/malloc/Makefile
@@ -0,0 +1,13 @@
+# From: @(#)Makefile 6.3 (Berkeley) 6/8/93
+# $Id: Makefile,v 1.2 1995/01/11 04:10:01 wollman Exp $
+
+VOLUME= papers
+DOC= malloc
+SRCS= abs.ms intro.ms kernel.ms malloc.ms problems.ms alternatives.ms
+SRCS+= performance.ms implementation.ms conclusion.ms
+MACROS= -ms
+
+edit:
+ vi ${SRCS}
+
+.include <bsd.doc.mk>
diff --git a/share/doc/papers/malloc/abs.ms b/share/doc/papers/malloc/abs.ms
new file mode 100644
index 0000000..beb418c
--- /dev/null
+++ b/share/doc/papers/malloc/abs.ms
@@ -0,0 +1,35 @@
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $Id$
+.\"
+.if n .ND
+.TL
+Malloc(3) in modern Virtual Memory environments.
+.sp
+Revised
+Fri Apr 5 12:50:07 1996
+.AU
+Poul-Henning Kamp
+.AI
+<phk@FreeBSD.org>
+Den Andensidste Viking
+Valbygaardsvej 8
+DK-4200 Slagelse
+Denmark
+.AB
+Malloc/free is one of the oldest part of the C language environment
+and obviously the world has changed a bit since it was first made.
+The fact that most UNIX kernels have changed from a swap/segment to
+a virtual memory/page based memory management has not been sufficiently
+reflected in the implementations of the malloc/free API.
+.PP
+A new implementation was designed, written, tested and bench-marked
+with an eye on the workings and performance characteristics of modern
+Virtual Memory systems.
+.AE
diff --git a/share/doc/papers/malloc/alternatives.ms b/share/doc/papers/malloc/alternatives.ms
new file mode 100644
index 0000000..b336208
--- /dev/null
+++ b/share/doc/papers/malloc/alternatives.ms
@@ -0,0 +1,45 @@
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $Id$
+.\"
+.ds RH Alternative implementations
+.NH
+Alternative implementations
+.PP
+These problems were actually the inspiration for the first alternative
+malloc implementations.
+Since their main aim was debugging, they would often use techniques
+like allocating a guard zone before and after the chunk,
+and possibly fill these guard zones
+with some pattern, so accesses outside the allocated chunk can be detected
+with some decent probability.
+Another widely used technique is to use tables to keep track of what
+chunks were actually in what state and so on.
+.PP
+This class of debugging has been taken to its practical extreme by
+the product "Purify" which does the entire memory-colouring exercise
+and not only keeps track of what is in use and what isn't, but also
+detects if the first reference is a read (which would return undefined
+values) and other such violations.
+.PP
+Later actual complete implementations of malloc arrived, but many of
+these still based their workings on the basic schema mentioned previously,
+disregarding that in the meantime virtual memory and paging have
+become the standard environment.
+.PP
+The most widely used "alternative" malloc is undoubtedly ``gnumalloc''
+which have received wide acclaim and certainly runs faster than
+most stock mallocs. It does however tend to fare badly in a
+cases where paging is the norm rather than the exception.
+.PP
+The particular malloc that prompted this work basically didn't bother
+reusing storage until the kernel forced it to do so by refusing
+further allocations with sbrk(2).
+That may make sense if you work alone on your own personal mainframe,
+but as a general policy it is less than optimal.
diff --git a/share/doc/papers/malloc/conclusion.ms b/share/doc/papers/malloc/conclusion.ms
new file mode 100644
index 0000000..afd64fc
--- /dev/null
+++ b/share/doc/papers/malloc/conclusion.ms
@@ -0,0 +1,48 @@
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $Id$
+.\"
+.ds RH Conclusion and experience.
+.NH
+Conclusion and experience.
+.PP
+In general the performance differences between gnumalloc and this
+malloc are not that big.
+The major difference comes when primary storage is seriously
+over-committed, in which case gnumalloc
+wastes time paging in pages it's not going to use.
+In such cases as much as a factor of five in wall-clock time has
+been seen in difference.
+Apart from that gnumalloc and this implementation are pretty
+much head-on performance wise.
+.PP
+Several legacy programs in the BSD 4.4 Lite distribution had
+code that depended on the memory returned from malloc to
+be zeroed, in a couple of cases free(3) was called more than
+once for the same allocation and a few cases even called free(3)
+with pointers to objects in the data section or on the stack.
+.PP
+A couple of users have reported that using this malloc on other
+platforms yielded "pretty impressive results", but no hard benchmarks
+have been made.
+.ds RH Acknowledgements & references.
+.NH
+Acknowledgements & references.
+.PP
+The first implementation of this algorithm was actually a file system,
+done in assembler using 5-hole ``Baudot'' paper tape for a drum storage
+device attached to a 20 bit germanium transistor computer with 2000 words
+of memory, but that was many years ago.
+.PP
+Peter Wemm <peter@FreeBSD.org> came up with the idea to store the
+page-directory in mmap(2)'ed memory instead of in the heap.
+This has proven to be a good move.
+.PP
+Lars Fredriksen <fredriks@mcs.com> found and identified a
+fence-post bug in the code.
diff --git a/share/doc/papers/malloc/implementation.ms b/share/doc/papers/malloc/implementation.ms
new file mode 100644
index 0000000..3aba2c4
--- /dev/null
+++ b/share/doc/papers/malloc/implementation.ms
@@ -0,0 +1,222 @@
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $Id$
+.\"
+.ds RH Implementation
+.NH
+Implementation
+.PP
+A new malloc(3) implementation was written to meet the goals,
+and to the extent possible to address the shortcomings listed previously.
+.PP
+The source is 1218 lines of C code, and can be found in FreeBSD 2.2
+(and probably later versions as well) as src/lib/libc/stdlib/malloc.c.
+.PP
+The main data structure is the
+.I page-directory
+which contains a
+.B void*
+for each page we have control over.
+The value can be one of:
+.IP
+.B MALLOC_NOT_MINE
+Another part of the code may call brk(2) to get a piece of the cake.
+Consequently we cannot rely on the memory we get from the kernel to
+be one consequtive piece of memory and therefore we need a way to
+mark such pages as "untouchable".
+.IP
+.B MALLOC_FREE
+This is a free page.
+.IP
+.B MALLOC_FIRST
+This is the first page in a (multi-)page allocation.
+.IP
+.B MALLOC_FOLLOW
+This is a subsequent page in a multi-page allocation.
+.IP
+.B
+struct pginfo*
+.R
+A pointer to a structure describing a partitioned page.
+.PP
+In addition there exist a linked list of small data structures that
+describe the free space as runs of free pages.
+.PP
+Notice that these structures are not part of the free pages themselves,
+but rather allocated with malloc so that the free pages themselves
+are never referenced while they are free.
+.PP
+When a request for storage comes in, it will be treated as a ``page''
+allocation if it is bigger than half a page.
+The freelist will be searched and the first run of free pages that
+can satisfy the request is used. The first page gets set to
+.B MALLOC_FIRST
+status, if more than that one page is needed the rest of them gets
+.B MALLOC_FOLLOW
+status in the page-directory.
+.PP
+If there were no pages on the free-list, brk(2) will be called, and
+the pages will get added to the page-directory with status
+.B MALLOC_FREE
+and the search restarts.
+.PP
+Freeing a number of pages is done by changing their state in the
+page directory to MALLOC_FREE, and then traverse the free-pages list to
+find the right place for this run of pages, possibly collapsing
+with the two neighbouring runs into one run and, if it is possible,
+release some memory back to the kernel by calling brk(2).
+.PP
+If the request is less than or equal to half of a page, its size will be
+rounded up to the nearest power of two before being processed
+and if the request is less than some minimum size, it is rounded up to
+that size.
+.PP
+These sub-page allocations are served from pages which are split up
+into some number of equal size chunks.
+For each of these pages a
+.B
+struct pginfo
+.R
+describes the size of the chunks on this page, how many there are,
+how many are free and so on.
+The description consist of a bitmap of used chunks, and various counters
+and numbers used to keep track of the stuff in the page.
+.PP
+For each size of sub-page allocation, the pginfo structures for the
+pages that have free chunks in them form a list.
+The head of these lists are stored in predetermined slots at
+the beginning of the page directory to make access fast.
+.PP
+To allocate a chunk of some size, the head of the list for the
+corresponding size is examined, and a free chunk found, the number
+of free chunks on that page is decreased by one and if zero the
+pginfo structure is unlinked from the list.
+.PP
+To free a chunk, the page is derived from the pointer, the page table
+for that page contains a pointer to the pginfo structure, where the
+free bit is set for the chunk, the number of free chunks increased by
+one, and if equal to one, the pginfo structure is linked into the
+proper place on the list for this size of chunks.
+If the count increases to match the number of chunks on the page, the
+pginfo structure is unlinked from the list and free(3)'ed and the
+actual page itself is free(3)'ed too.
+.PP
+To be 100% correct performance-wise these lists should be ordered
+according to the recent number of accesses to that page. This
+information is not available and it would essentially mean a reordering
+of the list on every memory reference to keep it up-to-date.
+Instead they are ordered according to the address of the pages.
+Interestingly enough, in practice this comes out to almost the same
+thing performance wise.
+.PP
+It's not that surprising after all, it's the difference between
+following the crowd or actively directing where it can go, in both
+ways you can end up in the middle of it all.
+.PP
+The sideffect of this compromise is that it also uses less storage,
+and the list never has to be reordered, all the ordering happens when
+pages are added or deleted.
+.PP
+It is an interesting twist to the implementation that the
+.B
+struct pginfo
+.R
+Is allocated with malloc.
+That is, "as with malloc" to be painfully correct.
+The code knows the special case where the first (couple) of allocations on
+the page is actually the pginfo structure and deals with it accordingly.
+This avoids some silly "chicken and egg" issues.
+.ds RH Bells and whistles.
+.NH
+Bells and whistles.
+.PP
+brk(2) is actually not a very fast system call when you ask for storage.
+This is mainly because of the need by the kernel to zero the pages before
+handing them over, so therefore this implementation does not release
+back heap-pages, until there is a large chunk to release back to the kernel.
+Chances are pretty good that we will need it again pretty soon anyway.
+Since these pages are not accessed at all, they will soon be paged out
+and don't affect anything but swap-space usage.
+.PP
+The page directory is actually kept in a mmap(2)'ed piece of
+anonymous memory. This avoids some rather silly cases that
+we would otherwise have to be handled when the page directory
+has to be extended.
+.PP
+One particular nice feature is that all pointers passed to free(3)
+and realloc(3) can be checked conclusively for validity:
+First the pointer is masked to find the page. The page directory
+is then examined, it must contain either MALLOC_FIRST, in which
+case the pointer must point exactly at the page, or it can contain
+a struct pginfo*, in which case the pointer must point to a one of
+the chunks described by that structure.
+Warnings will be printed on stderr and nothing will be done with
+the pointer in case it is found to be invalid.
+.PP
+An environment variable
+.B MALLOC_OPTIONS
+allows the user some control over the behaviour of malloc.
+Some of the more interesting options are:
+.IP
+.B Abort
+If malloc fails to allocate storage, core-dump the process with
+a message rather than expect it handle this correctly.
+It's amazing how few programs actually handle this condition correctly,
+and consequently the havoc they can create is the more creative or
+destructive.
+.IP
+.B Realloc
+Always do a free and malloc when realloc(3) is called. The default
+is to leave things alone if the size of the allocation is still in
+the same size-class.
+For programs doing garbage collect using realloc(3) this make the
+heap collapse faster. Since the malloc will reallocate from the
+lowest available address.
+.IP
+.B Junk
+will explicitly fill the allocated area with a particular value
+to try to detect if programs rely on it being zero.
+.IP
+.B Zero
+will explicitly zero out the allocated chunk of memory, while any
+space after the allocation in the chunk will be filled with the
+junk value to try to catch out of the chunk references.
+.ds RH The road not taken.
+.NH
+The road yet not taken.
+.PP
+A couple of avenues were explored that could be interesting in some
+set of circumstances.
+.PP
+Using mmap(2) instead of brk(2) was actually slower, since brk(2)
+knows a lot of the things that mmap has to find out first.
+.PP
+A system call where we could tell the kernel that "we don't
+need the contents of this page anymore" would allow us to
+return the pages on the free list to the kernel and to instruct
+the kernel that it doesn't need to page it out nor in.
+It would save some page-out events, and the page-in would be replaced
+by a zero-fill page.
+This is, according to the VM goods in the FreeBSD camp, "easy",
+and it will probably be attempted at some point in the future.
+.PP
+In general there is little room for further improvement of the
+time-overhead of the malloc, further improvements will have to
+be in the area of improving paging behaviour.
+.PP
+It is still under consideration to add a feature such that
+if realloc is called with two zero arguments, the internal
+allocations will be reallocated to perform a garbage collect.
+This could be used in certain types of programs to collapse
+the memory use, but so far it doesn't seem to be worth the effort.
+.PP
+Malloc/Free can be a significant point of contention in multi-threaded
+programs. Low-grain locking of the data-structures inside the
+implementation should be implemented to avoid excessive spin-waiting.
+
diff --git a/share/doc/papers/malloc/intro.ms b/share/doc/papers/malloc/intro.ms
new file mode 100644
index 0000000..b0301f8
--- /dev/null
+++ b/share/doc/papers/malloc/intro.ms
@@ -0,0 +1,74 @@
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $Id$
+.\"
+.ds RH Introduction
+.NH
+Introduction
+.PP
+Most programs need to allocate storage dynamically in addition
+to whatever static storage the compiler reserved at compile-time.
+To C programmers this fact is rather obvious, but for many years
+this was not an accepted and recognized fact, and many languages
+still used today doesn't support this notion adequately.
+.PP
+The classic UNIX kernel provides two very simple and powerful
+mechanisms for obtaining dynamic storage, the execution stack
+and the heap.
+The stack is usually put at the far upper end of the address-space,
+from where it grows down as far as needed, though this may depend on
+the CPU design.
+The heap starts at the end of the
+.B bss
+segment and grows upwards as needed.
+.PP
+There isn't really a kernel-interface to the stack as such.
+The kernel will allocate some amount of memory for it,
+not even telling the process the exact size.
+If the process needs more space than that, it will simply try to access
+it, hoping that the kernel will detect that access have been
+attempted outside the allocated memory, and try to extend it.
+If the kernel fails to extend the stack, this could be because of lack
+of resources or permissions or because it may just be impossible
+to do in the first place, the process will usually be shot down by the
+kernel.
+.PP
+In the C language, there exists a little used interface to the stack,
+.B alloca(3) ,
+which will explicitly allocate space on the stack.
+This is not a interface to the kernel, but merely an adjustment
+done to the stack-pointer such that space will be available and
+unharmed by any subroutine calls yet to be made while the context
+of the current subroutine is intact.
+.PP
+Due to the nature of normal use of the stack, there is no corresponding
+"free" operator, but instead the space is returned when the current
+function returns to its caller and the stack frame is dismanteled.
+This is the cause of much grief, and probably the single most important
+reason that alloca(3) is not, and should not be, used widely.
+.PP
+The heap on the other hand has an explicit kernel-interface in the
+system call
+.B brk(2) .
+The argument to brk(2) is a pointer to where the process wants the
+heap to end.
+There is also a interface called
+.B sbrk(2)
+taking an increment to the current end of the heap, but this is merely a
+.B libc
+front for brk(2).
+.PP
+In addition to these two memory resources, modern virtual memory kernels
+provide the mmap(2)/mmunmap(2) interface which allows almost complete
+control over any bit of virtual memory in the process address room.
+.PP
+Because of the generality of the mmap(2) interface and the way the
+data structures representing the regions are laid out, sbrk(2) is actually
+faster in use than the equivalent mmap(2) call, simply because the
+mmap(2) has to search for information that is implicit in the sbrk(2) call.
diff --git a/share/doc/papers/malloc/kernel.ms b/share/doc/papers/malloc/kernel.ms
new file mode 100644
index 0000000..1a4e967
--- /dev/null
+++ b/share/doc/papers/malloc/kernel.ms
@@ -0,0 +1,56 @@
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $Id$
+.\"
+.ds RH The kernel and memory
+.NH
+The kernel and memory
+.PP
+Brk(2) isn't a particular convenient interface,
+it was probably made more to fit the memory model of the
+hardware being used, than to fill the needs of the programmers.
+.PP
+Before paged and/or virtual memory systems became
+common, the most popular memory management facility used for
+UNIX was segments.
+This was also very often the only vehicle for imposing protection on
+various parts of memory.
+Depending on the hardware, segments can be anything, and consequently
+how the kernels exploited them varied a lot from UNIX to UNIX and from
+machine to machine.
+.PP
+Typically a process would have one segment for the text section, one
+for the data and bss section combined and one for the stack.
+On some systems the text shared a segment with the data and bss, and was
+consequently just as writable as them.
+.PP
+In this setup all the the brk(2) system call have to do is to find the
+right amount of free storage, possibly moving things around in physical
+memory, maybe even swapping out a segment or two to make space,
+and change the upper limit on the data segment according to the address given.
+.PP
+In a more modern page based virtual memory implementation this is still
+pretty much the situation, except that the granularity is now pages:
+The kernel finds the right number of free pages, possibly paging some
+pages out to free them up, and then plug them into the page-table of
+the process.
+.PP
+As such the difference is very small, the real difference is that in
+the old world of swapping, either the entire process was in primary
+storage (or it wouldn't be selected to be run) in a modern VM kernel,
+a process might only have a subset of its pages in primary memory,
+the rest will be paged in, if and when the process tries to access them.
+.PP
+Only very few programs deal with the brk(2) interface directly, the
+few that does usually have their own memory management facilities.
+LISP or FORTH interpreters are good examples.
+Most other programs use the
+.B malloc(3)
+interface instead, and leave it to the malloc implementation to
+use brk(2) to get storage allocated from the kernel.
diff --git a/share/doc/papers/malloc/malloc.ms b/share/doc/papers/malloc/malloc.ms
new file mode 100644
index 0000000..0deb7ec
--- /dev/null
+++ b/share/doc/papers/malloc/malloc.ms
@@ -0,0 +1,72 @@
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $Id$
+.\"
+.ds RH Malloc and free
+.NH
+Malloc and free
+.PP
+The job of malloc(3) is to turn the rather simple
+brk(2) facility into a service programs can
+actually use without getting hurt.
+.PP
+The archetypical malloc(3) implementation keeps track of the memory between
+the end of the bss section, as defined by the
+.B _end
+symbol, and the current brk(2) point using a linked list of chunks of memory.
+Each item on the list has a status as either free or used, a pointer
+to the next entry and in most cases to the previous as well, to speed
+up inserts and deletes in the list.
+.PP
+When a malloc(3) request comes in, the list is traversed from the
+front and if a free chunk big enough to hold the request is found,
+it is returned, if the free chunk is bigger than the size requested,
+a new free chunk is made from the excess and put back on the list.
+.PP
+When a chunk is
+.B free(3) 'ed,
+the chunk is found in the list, its status
+is changed to free and if one or both of the surrounding chunks
+are free, they are collapsed to one.
+.PP
+A third kind of request,
+.B realloc(3)
+exists, it will resize
+a chunk, trying to avoid copying the contents if possible.
+It is seldom used, and has only had a significant impact on performance
+in a few special situations.
+The typical pattern of use is to malloc(3) a chunk of the maximum size
+needed, read in the data and adjust the size of the chunk to match the
+size of the data read using realloc(3).
+.PP
+For reasons of efficiency, the original implementation of malloc(3)
+put the small structure used to contain the next and previous pointers
+plus the state of the chunk right before the chunk itself.
+.PP
+As a matter of fact, the canonical malloc(3) implementation can be
+studied in the ``Old testament'', chapter 8 verse 7 [Kernigan & Richie]
+.PP
+Various optimisations can be applied to the above basic algorithm:
+.IP
+If in freeing a chunk, we end up with the last chunk on the list being
+free, we can return that to the kernel by calling brk(2) with the first
+address of that chunk and then make the previous chunk the last on the
+chain by terminating its ``next'' pointer.
+.IP
+A best-fit algorithm can be used instead of first-fit at an expense
+of memory, because statistically fewer chances to brk(2) backwards will
+present themselves.
+.IP
+Splitting the list in two, once for used and one for free chunks to
+speed the searching.
+.IP
+Putting free chunks on one of several free-list depending on the size
+to speed allocation.
+.IP
+\&...
diff --git a/share/doc/papers/malloc/performance.ms b/share/doc/papers/malloc/performance.ms
new file mode 100644
index 0000000..bb023ea
--- /dev/null
+++ b/share/doc/papers/malloc/performance.ms
@@ -0,0 +1,113 @@
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $Id$
+.\"
+.ds RH Performance
+.NH
+Performance
+.PP
+Performance for a malloc(3) implementation comes as two variables:
+.IP
+A: How much time does it use for searching and manipulating data structures.
+We will refer to this as ``overhead time''.
+.IP
+B: How well does it manage the storage.
+This rather vague metric we call ``quality of allocation''.
+.PP
+The overhead time is easy to measure, just to a lot of malloc/free calls
+of various kinds and combination, and compare the results.
+.PP
+The quality of allocation is not quite as simple as that.
+One measure of quality is the size of the process, that should obviously
+be minimized.
+Another measure is the execution time of the process.
+This is not an obvious indicator of quality, but people will generally
+agree that it should be minimized as well, and if malloc(3) can do
+anything to do so, it should.
+Explanation why it is still a good metric follows:
+.PP
+In a traditional segment/swap kernel, the desirable behaviour of a process
+is to keep the brk(2) as low as possible, thus minimizing the size of the
+data/bss/heap segment, which in turn translates to a smaller process and
+a smaller probability of the process being swapped out, qed: faster
+execution time as an average.
+.PP
+In a paging environment this is not a bad choice for a default, but
+a couple of details needs to be looked at much more carefully.
+.PP
+First of all, the size of a process becomes a more vague concept since
+only the pages that are actually used needs to be in primary storage
+for execution to progress, and they only need to be there when used.
+That implies that many more processes can fit in the same amount of
+primary storage, since most processes have a high degree of locality
+of reference and thus only need some fraction of their pages to actually
+do their job.
+.PP
+From this it follows that the interesting size of the process, is some
+subset of the total amount of virtual memory occupied by the process.
+This number isn't a constant, it varies depending on the whereabouts
+of the process, and it may indeed fluctuate wildly over the lifetime
+of the process.
+.PP
+One of the names for this vague concept is ``current working set''.
+It has been defined many different ways over the years, mostly to
+satisfy and support claims in marketing or benchmark contexts.
+.PP
+For now we can simply say that it is the number of pages the process
+needs in order to run at a sufficiently low paging rate in a congested
+primary storage.
+(If primary storage isn't congested, this is not really important
+of course, but most systems would be better of using the pages for
+disk-cache or similar functions, so from that perspective it will
+always be congested.)
+If this number of pages is to small, the process will wait for the its
+pages to be read from secondary storage much of the time, if it's too
+big, the space could be used better for something else.
+.PP
+From the view of any single process, this number of pages is
+"all of my pages", but from the point of view of the OS is should
+be tuned to maximise the total throughput of all the processes on
+the machine at the time.
+This is usually done using various kinds of least-recently-used
+replacement algorithms to select page candidates for replacement.
+.PP
+With this knowledge, can we decide what the performance goal is for
+a modern malloc(3) ?
+Well, it's almost as simple as it used to be:
+.B
+Minimize the number of pages accessed.
+.R
+.PP
+This really is the core of it all.
+If the number of accessed pages is small, then locality of reference is
+higher, and all kinds of caches (which essentially is what the
+primary storage is in a VM system) works better.
+.PP
+It's interesting to notice that the classical malloc fails on this one
+because the information about free chunks are kept with the free
+chunks themselves. In some of the benchmarks this came out as all the
+pages were paged in every time a malloc were made, because malloc
+had to traverse the free-list to find a suitable chunk for the allocation.
+If memory is not in use, then you shouldn't access it.
+.PP
+The secondary goal is more evident:
+.B
+Try to work in pages.
+.R
+.PP
+That makes it easier for the kernel, and wastes less virtual memory.
+Most modern implementations does this when they interact with the
+kernel, but few try to avoid objects spanning pages.
+.PP
+If an objects size
+is less or equal to a page, there is no reason for it to span two pages.
+Having objects span pages means that two pages must be
+paged in, if that object is accessed.
+.PP
+With this analysis in the luggage, we can start coding.
diff --git a/share/doc/papers/malloc/problems.ms b/share/doc/papers/malloc/problems.ms
new file mode 100644
index 0000000..c566b6c
--- /dev/null
+++ b/share/doc/papers/malloc/problems.ms
@@ -0,0 +1,54 @@
+.\"
+.\" ----------------------------------------------------------------------------
+.\" "THE BEER-WARE LICENSE" (Revision 42):
+.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you
+.\" can do whatever you want with this stuff. If we meet some day, and you think
+.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
+.\" ----------------------------------------------------------------------------
+.\"
+.\" $Id$
+.\"
+.ds RH The problems
+.NH
+The problems
+.PP
+Even though malloc(3) is a lot simpler to use
+than the raw brk(2)/sbrk(2) interface
+or maybe exactly because
+of that,
+a lot of problems arise from its use.
+.IP
+Writing to memory outside the allocated chunk.
+The most likely result being that the data structure used to hold
+the links and flags about this chunk or the next one gets thrashed.
+.IP
+Freeing a pointer to memory not allocated by malloc.
+This is often a pointer that points to an object on the stack or in the
+data-section, in newer implementations of C it may even be in the text-
+section where it is likely to be readonly.
+Some malloc implementations detect this, some don't.
+.IP
+Freeing a modified pointer. This is a very common mistake, freeing
+not the pointer malloc(3) returned, but rather some offset from it.
+Some mallocs will handle this correctly if the offset is positive.
+.IP
+Freeing the same pointer more than once.
+.IP
+Accessing memory in a chunk after it has been free(3)'ed.
+.PP
+The handling of these problems have traditionally been weak.
+A core-dump was the most common form for "handling", but in rare
+cases one could experience the famous "malloc: corrupt arena."
+message before the core-dump.
+Even worse though, very often the program will just continue,
+possibly giving wrong results.
+.PP
+An entirely different form for problem is that
+the memory returned by malloc(3) can contain any value.
+Unfortunately most kernels, correctly, zero out the storage they
+provide with brk(2), and thus the storage malloc returns will be zeroed
+in many cases as well, so programmers are not particular apt to notice
+that their code depend on malloc'ed storage to be zeroed.
+.PP
+With problems this big and error handling this weak, it is not
+surprising that problems are hard and time consuming to find and fix.
OpenPOWER on IntegriCloud