1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
|
.\" Copyright (c) 1991, 1993
.\" The Regents of the University of California. All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\" must display the following acknowledgement:
.\" This product includes software developed by the University of
.\" California, Berkeley and its contributors.
.\" 4. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" @(#)mmap.2 8.4 (Berkeley) 5/11/95
.\" $FreeBSD$
.\"
.Dd May 11, 1995
.Dt MMAP 2
.Os BSD 4
.Sh NAME
.Nm mmap
.Nd map files or devices into memory
.Sh LIBRARY
.Lb libc
.Sh SYNOPSIS
.Fd #include <sys/types.h>
.Fd #include <sys/mman.h>
.Ft void *
.Fn mmap "void * addr" "size_t len" "int prot" "int flags" "int fd" "off_t offset"
.Sh DESCRIPTION
The
.Fn mmap
function causes the pages starting at
.Fa addr
and continuing for at most
.Fa len
bytes to be mapped from the object described by
.Fa fd ,
starting at byte offset
.Fa offset .
If
.Fa len
is not a multiple of the pagesize, the mapped region may extend past the
specified range.
Any such extension beyond the end of the mapped object will be zero-filled.
.Pp
If
.Fa addr
is non-zero, it is used as a hint to the system.
(As a convenience to the system, the actual address of the region may differ
from the address supplied.)
If
.Fa addr
is zero, an address will be selected by the system.
The actual starting address of the region is returned.
A successful
.Fa mmap
deletes any previous mapping in the allocated address range.
.Pp
The protections (region accessibility) are specified in the
.Fa prot
argument by
.Em or Ns 'ing
the following values:
.Pp
.Bl -tag -width MAP_FIXEDX
.It Dv PROT_EXEC
Pages may be executed.
.It Dv PROT_READ
Pages may be read.
.It Dv PROT_WRITE
Pages may be written.
.El
.Pp
The
.Fa flags
parameter specifies the type of the mapped object, mapping options and
whether modifications made to the mapped copy of the page are private
to the process or are to be shared with other references.
Sharing, mapping type and options are specified in the
.Fa flags
argument by
.Em or Ns 'ing
the following values:
.Pp
.Bl -tag -width MAP_FIXEDX
.It Dv MAP_ANON
Map anonymous memory not associated with any specific file.
The file descriptor used for creating
.Dv MAP_ANON
must be \-1.
The
.Fa offset
parameter is ignored.
.\".It Dv MAP_FILE
.\"Mapped from a regular file or character-special device memory.
.It Dv MAP_FIXED
Do not permit the system to select a different address than the one
specified.
If the specified address cannot be used,
.Fn mmap
will fail.
If MAP_FIXED is specified,
.Fa addr
must be a multiple of the pagesize.
Use of this option is discouraged.
.It Dv MAP_HASSEMAPHORE
Notify the kernel that the region may contain semaphores and that special
handling may be necessary.
.It Dv MAP_INHERIT
Permit regions to be inherited across
.Xr execve 2
system calls.
.It Dv MAP_PRIVATE
Modifications are private.
.It Dv MAP_SHARED
Modifications are shared.
.It Dv MAP_STACK
This option is only available if your system has been compiled with
VM_STACK defined when compiling the kernel.
This is the default for
i386 only.
Consider adding -DVM_STACK to COPTFLAGS in your /etc/make.conf
to enable this option for other architechures.
MAP_STACK implies
MAP_ANON, and
.Fa offset
of 0.
.Fa fd
must be -1 and
.Fa prot
must include at least PROT_READ and PROT_WRITE. This option creates
a memory region that grows to at most
.Fa len
bytes in size, starting from the stack top and growing down. The
stack top is the starting address returned by the call, plus
.Fa len
bytes. The bottom of the stack at maximum growth is the starting
address returned by the call.
.It Dv MAP_NOSYNC
Causes data dirtied via this VM map to be flushed to physical media
only when necessary (usually by the pager) rather then gratuitously.
Typically this prevents the update daemons from flushing pages dirtied
through such maps and thus allows efficient sharing of memory across
unassociated processes using a file-backed shared memory map. Without
this option any VM pages you dirty may be flushed to disk every so often
(every 30-60 seconds usually) which can create performance problems if you
do not need that to occur (such as when you are using shared file-backed
mmap regions for IPC purposes). Note that VM/filesystem coherency is
maintained whether you use MAP_NOSYNC or not. This option is not portable
across UNIX platforms (yet), though some may implement the same behavior
by default.
.Pp
WARNING!
Extending a file with
.Xr ftruncate 2 ,
thus creating a big hole, and then filling the hole by modifying a shared
.Fn mmap
can lead to severe file fragmentation.
In order to avoid such fragmentation you should always pre-allocate the
file's backing store by
.Fn write Ns ing
zero's into the newly extended area prior to modifying the area via your
.Fn mmap .
The fragmentation problem is especially sensitive to
.Dv MAP_NOSYNC
pages, because pages may be flushed to disk in a totally random order.
.Pp
The same applies when using
.Dv MAP_NOSYNC
to implement a file-based shared memory store.
It is recommended that you create the backing store by
.Fn write Ns ing
zero's to the backing file rather then
.Fn ftruncate Ns ing
it.
You can test file fragmentation by observing the KB/t (kilobytes per
transfer) results from an
.Dq Li iostat 1
while reading a large file sequentially, e.g. using
.Dq Li dd if=filename of=/dev/null bs=32k .
.Pp
The
.Xr fsync 2
function will flush all dirty data and metadata associated with a file,
including dirty NOSYNC VM data, to physical media. The
.Xr sync 8
command and
.Xr sync 2
system call generally do not flush dirty NOSYNC VM data.
The
.Xr msync 2
system call is obsolete since
.Bx
implements a coherent filesystem buffer cache. However, it may be
used to associate dirty VM pages with filesystem buffers and thus cause
them to be flushed to physical media sooner rather then later.
.It Dv MAP_NOCORE
Region is not included in a core file.
.El
.Pp
The
.Xr close 2
function does not unmap pages, see
.Xr munmap 2
for further information.
.Pp
The current design does not allow a process to specify the location of
swap space.
In the future we may define an additional mapping type,
.Dv MAP_SWAP ,
in which
the file descriptor argument specifies a file or device to which swapping
should be done.
.Sh RETURN VALUES
Upon successful completion,
.Fn mmap
returns a pointer to the mapped region.
Otherwise, a value of MAP_FAILED is returned and
.Va errno
is set to indicate the error.
.Sh ERRORS
.Fn Mmap
will fail if:
.Bl -tag -width Er
.It Bq Er EACCES
The flag
.Dv PROT_READ
was specified as part of the
.Fa prot
parameter and
.Fa fd
was not open for reading.
The flags
.Dv MAP_SHARED
and
.Dv PROT_WRITE
were specified as part of the
.Fa flags
and
.Fa prot
parameters and
.Fa fd
was not open for writing.
.It Bq Er EBADF
.Fa Fd
is not a valid open file descriptor.
.It Bq Er EINVAL
.Dv MAP_FIXED
was specified and the
.Fa addr
parameter was not page aligned, or part of the desired address space
resides out of the valid address space for a user process.
.It Bq Er EINVAL
.Fa Len
was negative.
.It Bq Er EINVAL
.Dv MAP_ANON
was specified and the
.Fa fd
parameter was not -1.
.It Bq Er EINVAL
.Dv MAP_ANON
has not been specified and
.Fa fd
did not reference a regular or character special file.
.It Bq Er EINVAL
.Fa Offset
was not page-aligned. (See BUGS below.)
.It Bq Er ENOMEM
.Dv MAP_FIXED
was specified and the
.Fa addr
parameter wasn't available, or the system has reached the per-process mmap
limit specified in the vm.max_proc_mmap sysctl.
.Dv MAP_ANON
was specified and insufficient memory was available.
.Sh "SEE ALSO"
.Xr madvise 2 ,
.Xr mincore 2 ,
.Xr mlock 2 ,
.Xr mprotect 2 ,
.Xr msync 2 ,
.Xr munlock 2 ,
.Xr munmap 2 ,
.Xr getpagesize 3
.Sh BUGS
.Fa len
is limited to 2GB. Mmapping slightly more than 2GB doesn't work, but
it is possible to map a window of size (filesize % 2GB) for file sizes
of slightly less than 2G, 4GB, 6GB and 8GB.
.Pp
The limit is imposed for a variety of reasons.
Most of them have to do
with
.Fx
not wanting to use 64 bit offsets in the VM system due to
the extreme performance penalty.
So
.Fx
uses 32bit page indexes and
this gives
.Fx
a maximum of 8TB filesizes.
It's actually bugs in
the filesystem code that causes the limit to be further restricted to
1TB (loss of precision when doing blockno calculations).
.Pp
Another reason for the 2GB limit is that filesystem metadata can
reside at negative offsets.
.Pp
We currently can only deal with page aligned file offsets.
|