MTI 6.S081 LabMmap

MIT 6.S081 Mmap

  • mmap/munmap
  • Experimental task
  • Hints
  • solution
    • Prepare
      • VMA
    • mmap
    • Page fault handling
    • munmap
    • Write back shared files after unmapping
    • exit unmaps all mapped files
    • fork copies the VMA of the parent process

mmap/munmap

The mmap and munmap system calls allow UNIX programs detailed control over their address spaces. They can be used to share memory between processes, to map files into process address spaces, and as part of user-level page fault scenarios, such as the garbage collection algorithms discussed in the lecture. In this lab, you’ll add mmap and munmap to xv6, focusing on memory-mapped files.

The man page (run man 2 mmap ) shows mmap’s declaration:

void *mmap(void *addr, size_t length, int prot, int flags,
           int fd, off_t offset);

mmap can be invoked in a variety of ways, but this experiment only requires a subset of its features related to memory-mapped files. You can assume that addr is always zero, which means the kernel should decide the virtual address of the mapped file. mmap returns that address, or 0xffffffffff if it fails. length is the number of bytes to map; it may be different from the length of the file. prot indicates whether the memory should be mapped as readable, writable and/or executable; you can assume prot is PROT_READ or PROT_WRITE, or both. The flag will be MAP_SHARED, meaning modifications to mapped memory should be written back to the file, or MAP_PRIVATE, meaning they should not be written back. You don’t have to implement any other bits in the flags. fd is the open file descriptor of the file to map. You can assume the offset is zero (it’s the start of the mapping in the file).

It is also ok if processes mapping the same MAP_SHARED file do not share physical pages.

munmap(addr, length) should remove mmap mappings within the indicated address range. If the process has modified memory and mapped it to MAP_SHARED, the modifications should be written to the file first. A munmap call may only cover part of an mmap-ed region, but you can assume it will unmap at the beginning, end, or the entire region (but not punch holes in the middle of the region).

Experimental tasks

Experimental task: You should implement enough mmap and munmap functionality for the mmaptest test program to work. If mmaptest does not use the mmap function, it does not need to implement that function.

When complete, you should see the following output:

$ mmaptest
mmap_test starting
test mmap f
test mmap f: OK
test mmap private
test mmap private: OK
test mmap read-only
test mmap read-only: OK
test mmap read/write
test mmap read/write: OK
test mmap dirty
test mmap dirty: OK
test not-mapped unmap
test not-mapped unmap: OK
test mmap two files
test mmap two files: OK
mmap_test: ALL OK
fork_test starting
fork_test OK
mmaptest: all tests succeeded
$ usertests -q
user tests starting
...
ALL TESTS PASSED

Hints

  • First, add _mmaptest to UPROGS, along with the mmap and munmap syscalls, in order for user/maptest.c to compile. Now, only errors in mmap and munmap are returned. We define PROT_READ etc. for you in kernel/fcntl.h. Run mmaptest, it will fail on the first mmap call.
  • Lazy allocation of pages in response to page faults. That is, mmap should not allocate physical memory or read files. Instead, do this in the page error handling code in (or called by) the usertrap, as in the lazy page allocation lab. The reason for lazy is to ensure that the mmap of large files is fast, and that it is possible to mmap files larger than physical memory.
  • Keep track of what mmap maps for each process. Define a structure corresponding to the VMA (Virtual Memory Area) described in Lecture 15, recording the address, length, permissions, files, etc. of the virtual memory range created by mmap. Since there is no memory allocator in the xv6 kernel, it is possible to declare a fixed-size array of VMAs and allocate from that array as needed. Size 16 should be enough.
  • Implement mmap: find an unused area in the process’s address space to map the file, and add the VMA to the process’s mapped area table. VMA should contain a pointer to the structure file of the file being mapped; mmap should increment the reference count of the file so that the structure doesn’t disappear when the file is closed (hint: see filedup). Run mmaptest: the first mmap should succeed, but the first access to mmap-ed memory will cause a page fault and terminate mmaptest.
  • Add code that causes a page fault in the mmap-ed region to allocate a page of physical memory, read 4096 bytes of the associated file into that page, and map it into user address space. Read the file using readi, which takes an offset parameter to read the file (but you have to lock/unlock the inode passed to readi). Don’t forget to set permissions properly on the page. Run mmaptest; it should reach the first munmap.
  • Implements munmap: finds the VMA for the address range and unmaps the specified pages (hint: use uvmunmap). If munmap removed all pages from the previous mmap, then it should decrement the reference count of the corresponding structure file. If the unmapped page has been modified, and the file is mapped MAP_SHARED, write the page back to the file. Find inspiration from filewrite.
  • Ideally, your implementation will only write back MAP_SHARED pages that your program actually modifies. The dirty bit (D) in the RISC-V PTE indicates whether the page has already been written. However, mmaptest does not check that non-dirty pages were not written back; therefore, you can write back pages without looking at the D bit.
  • Modify exit to unmap the mapped regions of the process as if munmap had been called. Run mmaptest; mmap_test should pass, but fork_test may not.
  • Modify the fork to ensure that the child has the same mapped region as the parent. Don’t forget to increase the reference count of the VMA structure file. In the child’s page fault handler, a new physical page can be allocated instead of sharing the page with the parent. The latter would be cooler, but would require more work to implement. Run mmaptest; it should pass both mmaptest and forktest.

Run usertests -q to make sure everything is working.

Solution

We can build solutions based on Hints.

Preparation

A lot of experiments must have been completed before this experiment, here is mainly to add function prototypes so that mmap/munmap can be called

VMA

struct vma {<!-- -->
  struct spinlock lock;
  uint64 start;
  uint64 end;
  int length;
  int off;
  int perm;
  int flags;
  struct file *file;
  struct vma *next;
};
struct vma VMA[NVMA];

Note that each length in the VMA is changed to -1, indicating that this structure is not occupied, and the kernel can be allocated to any process.

struct vma *
vma_alloc(void) {<!-- -->
  for(int i = 0; i < NVMA; i ++ ){<!-- -->
    if (VMA[i].length == -1) {<!-- -->
      acquire( & amp; VMA[i].lock);
      if(VMA[i].length == -1) {<!-- -->
        VMA[i].length = 0;
        release( &VMA[i].lock);
        return &VMA[i];
      }
      release( &VMA[i].lock);
    }
  }
  panic("no enough vma");
}

First check whether there is a structure to be released. If so, grab the lock, and then check again to see if it has been allocated. If it is not allocated, it can be allocated to the current task.

mmap

uint64
sys_mmap(void) {<!-- -->
  uint64 addr;
  int length;
  int prot;
  int flags;
  int fd;
  int offset;
  argaddr(0, &addr);
  argint(1, & length);
  argint(2, &prot);
  argint(3, &flags);
  argint(4, &fd);
  argint(5, &offset);
  if (addr != 0 || offset != 0) {<!-- -->
    return -1;
  }
  struct proc *p = myproc();
  struct file* f = p->ofile[fd];

  int pte_flag = PTE_U;
  if (prot & amp; PROT_WRITE) {<!-- --> // want to write, then the file must be writable, or the flag is private
    if(!f->writable & amp; & amp; !(flags & amp; MAP_PRIVATE)) {<!-- -->
      return -1;
    }
    pte_flag |= PTE_W;
  }
  if (prot & amp; PROT_READ) {<!-- -->
    if(!f->readable) {<!-- -->
      return -1;
    }
    pte_flag |= PTE_R;
  }

  struct vma* v = vma_alloc();
  
  v->length = length;
  v->off = offset;
  v->perm = pte_flag;
  v->flags = flags;
  filedup(f);
  v->file = f;
  v->next = (struct vma*)0;
  struct vma *pv = p->vma;
  if (pv) {<!-- -->
    while (pv->next) {<!-- -->
      pv = pv->next;
    }
    v->start = PGROUNDUP(pv->end); // make the start position an integer of one page
    v->end = v->start + length;
    pv->next = v;
  } else {<!-- -->
    v->start = VMA_START;
    v->end = v->start + length;
    p->vma = v;
  }
  addr = v->start;
  printf("mmap: [%p, %p)\\
", addr, v->end); // for debugging
  return addr;
}

Page fault handling

In trap.c, if the error is caused by a page fault, enter the page fault handler, and check whether the address can be translated at this time.

Because in this experiment, we only do lazy allocation to mmap, so the address must be between the addresses of vma, otherwise it is a page fault caused by other reasons, which is not handled by the current experiment, and returns -1, and at the same time, pay attention to the This branch in trap.c sees that it returns -1 and needs to kill the process. Because this process may access, write, and execute addresses that he does not have permission, this is a fatal error.

int
mmap_handler(uint64 cause, uint64 va) {<!-- -->
  struct proc *p = myproc();
  struct vma *v = p->vma;

  while (v) {<!-- -->
    if (va >= v->start & amp; & amp; va < v->end) {<!-- -->
      break;
    }
    v = v->next;
  }
  
  if (v == 0 || cause == 12 || (scause == 13 & amp; & amp; !(v->perm & amp; PTE_R)) || (scause == 15 & amp; & amp; !(v->perm & PTE_W))) {<!-- -->
    // 12 Execute instructions, here it is impossible to map the dynamic link library to execute instructions, so no
    // 13 load causes, but if it is not readable, it will be an error
    // Caused by 15 store, not writable and error
    return -1;
  }
  va = PGROUNDDOWN(va); // look for page
  char *mem = kalloc();
  if (mem == 0) {<!-- -->
    // panic("mmap_handler: kalloc error");
    return -1;
  }

  memset(mem, 0, PGSIZE);

  uint doff = va + v->off - v->start; // actual start position in the file
  ilock(v->file->ip);
  readi(v->file->ip, 0, (uint64)mem, doff, PGSIZE);
  iunlock(v->file->ip);

  // At this point, the content of the file has been read
  if (mappages(p->pagetable, va, PGSIZE, (uint64)mem, v->perm) < 0) {<!-- -->
    kfree(mem);
    // panic("mmap_handler: mappages error");
    return -1;
  }
  return 0;
}

munmap

This is more difficult to deal with. I have read a lot of answers, but none of them are satisfactory to me. After reading the first few lines, I found that their code is not good, mainly because their handling is too ideal.

A lot of code makes the following assumptions:

  • The address addr released each time is an integer multiple of PGSIZE
  • The length of each release is an integer multiple of PGSIZE
  • At the time of release, the page that needs to be released must have been visited before, so there must be a corresponding PTE in the page, and it is valid, so directly use void uvmunmap(pagetable_t pagetable, uint64 va , uint64 npages, int do_free).

The above assumptions are all incorrect, especially, at the time of release, it is very likely that this page has not been accessed, so the release at this time will cause the release of an unmapped page, resulting in a panic.

In the code of mmaptest, the first two assumptions above are still satisfied, but the third assumption is not satisfied in mmaptest.

uint64
sys_munmap(void) {<!-- -->
  uint64 addr;
  int length;
  argaddr(0, &addr);
  argint(1, & length);
  if (addr == 0 || length == 0) {<!-- -->
    return 0; // don't unmap
  }

  struct proc *p = myproc();
  struct vma *v = p->vma;
  struct vma *pre = 0;
  while(v != 0){<!-- -->
    if(addr >= v->start & amp; & amp; addr < v->end) break; // found
    pre = v;
    v = v->next;
  }

  if(v == 0)
    return -1;
  printf("munmap: %p %d\\
", addr, length);
  if(addr != v->start & amp; & amp; addr + length != v->end)
    panic("munmap middle of vma");

  if (length > v->length) {<!-- -->
    length = v->length;
  }

  if (addr == v->start) {<!-- -->
    writeback(v, addr, length);
    // Since munmap may be called twice on the same mapping space, the addr here is not necessarily an integer multiple of PGSIZE
    // The actual pages to be released are (len + (addr - PGROUNDDOWN(addr))) / PGSIZE pages starting from PGROUNDDOWN(addr). And because there may not be an actual allocation at present, uvmunmap cannot be used
    uint64 end = (length + (addr - PGROUNDDOWN(addr))) / PGSIZE * PGSIZE + PGROUNDDOWN(addr);
    pte_t *pte;
    for (uint64 va = PGROUNDDOWN(addr); va < end; va + = PGSIZE) {<!-- -->
      if((pte = walk(p->pagetable, va, 0)) == 0) {<!-- -->
        continue;
      }
      if (*pte & PTE_V) {<!-- -->
        uint64 pa = PTE2PA(*pte);
        kfree((void*)pa);
        *pte = 0;
      } // else invalid, don't release
    }
    if(length == v->length){<!-- -->
      // All will be released, so some special operations are required for the last page
      uint64 va = PGROUNDDOWN(v->end - 1);
      if((pte = walk(p->pagetable, va, 0)) != 0) {<!-- -->
        if (*pte & PTE_V) {<!-- -->
          uint64 pa = PTE2PA(*pte);
          kfree((void*)pa); // free the last page
          *pte = 0;
        }
      }
      fileclose(v->file);
      if(pre == 0){<!-- -->
        p->vma = v->next; // head
      }else{<!-- -->
        pre->next = v->next;
      }
      v->next = 0;
      acquire( & amp;v->lock);
      v->length = -1;
      release( & amp;v->lock);
    } else {<!-- -->
      v->start + = length; // start position should add length
      v->off += length;
      v->length -= length;
    }
  } else {<!-- -->
    // Release from addr to end Then to release the page, from UP(addr) ~ up(end)
    pte_t *pte;
    for(uint64 va = PGROUNDUP(addr); va < PGROUNDUP(v->end); va + = PGSIZE){<!-- -->
      if((pte = walk(p->pagetable, va, 0)) == 0) {<!-- -->
        continue;
      }
      if (*pte & PTE_V) {<!-- -->
        uint64 pa = PTE2PA(*pte);
        kfree((void*)pa);
        *pte = 0;
      }
    }
    v->length -= length;
    v->end -= length;
  }
  return 0;
}

In my implementation, this code is more robust by taking into account the case where the entry is not in the page table at all.

Write back after unmapping the shared file

This also has the above problem. If it has not been accessed, it does not exist in the page table, so don’t put it back. Otherwise, the number of bytes written to the file will be 0 each time, resulting in an infinite loop.

void
writeback(struct vma* v, uint64 addr, int n)
{<!-- -->
  if(!(v->perm & amp; PTE_W) || (v->flags & amp; MAP_PRIVATE)) // not writable, or this is a private one, then it should not be written back
    return;
  struct proc *p = myproc();
  struct file* f = v->file;

  int max = ((MAXOPBLOCKS-1-1-2) / 2) * BSIZE;
  int i = 0;
  pte_t *pte;
  while(i < n){<!-- -->
    if ((pte = walk(p->pagetable, PGROUNDDOWN(addr), 0)) == 0 || !(*pte & amp; PTE_V)) {<!-- -->
      // pte does not exist or is invalid, indicating that this part has not changed
      i + = (PGROUNDDOWN(addr) + PGSIZE - addr);
      addr + = PGROUNDUP(addr + 1);
      continue;
    }
    // Found the physical location
    uint64 pa = PTE2PA(*pte);
    uint64 start = addr - PGROUNDDOWN(addr); // start position

    int n1 = n - i;
    if (n1 > PGSIZE - start) {<!-- -->
      // no more than this page
      n1 = PGSIZE - start;
    }
    if (n1 > max) {<!-- -->
      n1 = max;
    }
    begin_op();
    ilock(f->ip);
    int r = writei(f->ip, 0, pa + start, addr + v->off - v->start, n1);
    iunlock(f->ip);
    end_op();
    i + = r;
    addr + = r; // start writing from position r next time
  }
}

exit unmaps all mapped files

Same problem, only freeing pages that are actually loaded.

// Exit the current process. Does not return.
// An exited process remains in the zombie state
// until its parent calls wait().
void
exit(int status)
{<!-- -->
  struct proc *p = myproc();

  if(p == initproc)
    panic("init exiting");

  // munmap all mmap vma
  struct vma* v = p->vma;
  struct vma* pv;
  while(v){<!-- -->
    writeback(v, v->start, v->length);
    uint64 start = PGROUNDDOWN(v->start);
    uint64 end = PGROUNDUP(v->end);
    pte_t *pte;
    for (uint64 va = start; va < end; va + = PGSIZE) {<!-- -->
      if ((pte = walk(p->pagetable, va, 0)) == 0 || !(*pte & amp; PTE_V)) {<!-- -->
        continue; // not loaded
      }
      uint64 pa = PTE2PA(*pte);
      kfree((void*)pa);
      *pte = 0;
    }
    fileclose(v->file);
    pv = v->next;
    acquire( & amp;v->lock);
    v->next = 0;
    v->length = -1;
    release( & amp;v->lock);
    v = pv;
  }
  p->vma = 0;

  // Close all open files.
  ...
}

fork copies the VMA of the parent process

Since the mapcopy of fork has copied all the pages of the parent process for the child process, what is the distribution of the mapped pages in the parent process at this time, and what will be in the child process, just copy its vma structure directly. .

Of course, even if it is a lazy allocation, direct copying is fine, so at this time all pages corresponding to the child process vma are invalid. When accessing, a page fault will be triggered to make it valid.

Note that the vma of the child process cannot be directly pointed to the vma of the parent process. In this way, the parent and child processes cannot copy on write, destroying the pages they can point to, which may cause serious memory leaks.

// Create a new process, copying the parent.
// Sets up child kernel stack to return as if from fork() system call.
int
fork(void)
{<!-- -->
  int i, pid;
  struct proc *np;
  struct proc *p = myproc();
  
  ...

  acquire( &np->lock);
  np->state = RUNNABLE;
  np->vma = 0;
  struct vma *pv = p->vma;
  struct vma *pre = 0;
  while(pv){<!-- -->
    struct vma *vma = vma_alloc();
    vma->start = pv->start;
    vma->end = pv->end;
    vma->off = pv->off;
    vma->length = pv->length;
    vma->perm = pv->perm;
    vma->flags = pv->flags;
    vma->file = pv->file;
    filedup(vma->file);
    vma->next = 0;
    if(pre == 0){<!-- -->
      np->vma = vma;
    } else {<!-- -->
      pre->next = vma;
    }
    pre = vma;
    pv = pv->next;
  }
  release( &np->lock);

  return pid;
}