mmap() and do_mmap(): Creating an Address Interval
The do_mmap() function is used by the kernel to create a new linear address interval. Saying that this function creates a new VMA is not technically correct, because if the created address interval is adjacent to an existing address interval, and if they share the same permissions, the two intervals are merged into one. If this is not possible, a new VMA is created. In any case, do_mmap() is the function used to add an address interval to a process's address spacewhether that means expanding an existing memory area or creating a new one. The do_mmap() function is declared in <linux/mm.h>:
unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot,
unsigned long flag, unsigned long offset)
This function maps the file specified by file at offset offset for length len. The file parameter can be NULL and offset can be zero, in which case the mapping will not be backed by a file. In that case, this is called an anonymous mapping . If a file and offset are provided, the mapping is called a file-backed mapping . The addr function optionally specifies the initial address from which to start the search for a free interval.The prot parameter specifies the access permissions for pages in the memory area. The possible permission flags are defined in <asm/mman.h> and are unique to each supported architecture, although in practice each architecture defines the flags listed in Table 14.2.
Flag | Effect on the Pages in the New Interval |
---|---|
PROT_READ | Corresponds to VM_READ |
PROT_WRITE | Corresponds to VM_WRITE |
PROT_EXEC | Corresponds to VM_EXEC |
PROT_NONE | Page cannot be accessed |
Flag | Effect on the New Interval |
---|---|
MAP_SHARED | The mapping can be shared |
MAP_PRIVATE | The mapping cannot be shared |
MAP_FIXED | The new interval must start at the given address addr |
MAP_ANONYMOUS | The mapping is not file-backed, but is anonymous |
MAP_GROWSDOWN | Corresponds to VM_GROWSDOWN |
MAP_DENYWRITE | Corresponds to VM_DENYWRITE |
MAP_EXECUTABLE | Corresponds to VM_EXECUTABLE |
MAP_LOCKED | Corresponds to VM_LOCKED |
MAP_NORESERVE | No need to reserve space for the mapping |
MAP_POPULATE | Populate (prefault) page tables |
MAP_NONBLOCK | Do not block on I/O |
The mmap() System Call
The do_mmap() functionality is exported to user-space via the mmap() system call. The mmap() system call is defined as
void * mmap2(void *start,
size_t length,
int prot,
int flags,
int fd,
off_t pgoff)
This system call is named mmap2() because it is the second variant of mmap(). The original mmap() took an offset in bytes as the last parameter; the current mmap2() receives the offset in pages. This enables larger files with larger offsets to be mapped. The original mmap(), as specified by POSIX, is available from the C library as mmap(), but is no longer implemented in the kernel proper, whereas the new version is available as mmap2(). Both library calls use the mmap2() system call, with the original mmap() converting the offset from bytes to pages.