Lab 6: Copy-on-write fork

Lab 6: Copy-on-write fork

COW fork() creates just a pagetable for the child, with PTEs for user memory pointing to the parent’s physical pages. COW fork() marks all the user PTEs in both parent and child as not writable. When either process tries to write one of these COW pages, the CPU will force a page fault. The kernel page-fault handler detects this case, allocates a page of physical memory for the faulting process, copies the original page into the new page, and modifies the relevant PTE in the faulting process to refer to the new page, this time with the PTE marked writeable. When the page fault handler returns, the user process will be able to write its copy of the page.
COW fork() makes freeing of the physical pages that implement user memory a little trickier. A given physical page may be referred to by multiple processes’ page tables, and should be freed only when the last reference disappears.

Implement the fork lazy copy mechanism. After the process forks, the memory page is not copied immediately, but the virtual address points to the same physical address as the parent process. The memory page is copied only when either parent or son attempts to modify the memory page. Physical memory pages must be guaranteed to be released after all references are gone. A reference counting mechanism is required here.

  • It is necessary to increase the array that records the number of applications;
  • Only when the reference count is 0 can it be truly free;
  • Pay attention to setting flag;

uvmcopy() called when modifying fork

First modify uvmcopy(). When copying the memory of the parent process to the child process, the data is not copied immediately. Instead, a mapping to the original physical page is established, and the page table entries on both sides of the parent and child are set to non-writable.

// kernel/vm.c
int
uvmcopy(pagetable_t old, pagetable_t new, uint64 sz)
{<!-- -->
  pte_t *pte;
  uint64 pa, i;
  uint flags;

  for(i = 0; i < sz; i + = PGSIZE){<!-- -->
    if((pte = walk(old, i, 0)) == 0)
      panic("uvmcopy: pte should exist");
    if((*pte & amp; PTE_V) == 0)
      panic("uvmcopy: page not present");
    pa = PTE2PA(*pte);
    if(*pte & amp; PTE_W) {<!-- -->
      //Clear the PTE_W flag bit of the parent process and set the PTE_COW flag bit to indicate a lazy copy page (multiple processes refer to the same physical page)
      *pte = (*pte & amp; ~PTE_W) | PTE_COW;
    }
    flags = PTE_FLAGS(*pte);
    // Map the physical pages of the parent process directly to the child process (lazy copy)
    // Permission settings are consistent with the parent process
    // (Not writable + PTE_COW, or if the parent process page itself is simply read-only and not COW, the child process page is also read-only and has no COW flag)
    if(mappages(new, i, PGSIZE, (uint64)pa, flags) != 0){<!-- -->
      goto err;
    }
    // Increase the number of physical page references by 1
    krefpage((void*)pa);
  }
  return 0;

 err:
  uvmunmap(new, 0, i / PGSIZE, 1);
  return -1;
}

The PTE_COW flag is used above to indicate whether the physical page corresponding to a mapping is a lazy copy page. Here PTE_COW needs to be defined in riscv.h:

// kernel/riscv.h
#define PTE_V (1L << 0) // valid
#define PTE_R (1L << 1)
#define PTE_W (1L << 2)
#define PTE_X (1L << 3)
#define PTE_U (1L << 4) // 1 -> user can access
#define PTE_COW (1L << 8) // Whether it is a lazy copy page, expressed by the reserved 8th bit in the page table entry flags
// (In the page table entry flags, bits 8, 9, and 10 are reserved for the operating system and can be used for any custom purpose)

usertrap()

Similar to the lazy allocation lab, add page fault detection in usertrap(), and perform a real copy operation on the lazy copy page when the currently accessed address meets the lazy copy page conditions:

// kernel/trap.c
void
usertrap(void)
{<!-- -->

  //......

  } else if((which_dev = devintr()) != 0){<!-- -->
    //ok
  } else if((r_scause() == 13 || r_scause() == 15) & amp; & amp; uvmcheckcowpage(r_stval())) {<!-- --> // copy-on-write
    if(uvmcowcopy(r_stval()) == -1){<!-- --> // If there is insufficient memory, kill the process
      p->killed = 1;
    }
  } else {<!-- -->
    printf("usertrap(): unexpected scause %p pid=%d\
", r_scause(), p->pid);
    printf(" sepc=%p stval=%p\
", r_sepc(), r_stval());
    p->killed = 1;
  }

  //......

}

At the same time, since copyout() is a software access to the page table, it will not trigger a page fault exception, so you need to manually add the same monitoring code (same as lab5) to detect whether the received page is a lazy copy page. If so, perform the actual copy operation:

// kernel/vm.c
int
copyout(pagetable_t pagetable, uint64 dstva, char *src, uint64 len)
{<!-- -->
  uint64 n, va0, pa0;

  while(len > 0){<!-- -->
    if(uvmcheckcowpage(dstva)) // Check whether each written page is a COW page
      uvmcowcopy(dstva);
    va0 = PGROUNDDOWN(dstva);
    pa0 = walkaddr(pagetable, va0);
    
    //.......memmove from src to pa0

    len -= n;
    src + = n;
    dstva = va0 + PGSIZE;
  }

  //......
}

Implement lazy copy page detection (uvmcheckcowpage()) and real copy (uvmcowcopy()) operations:

// kernel/vm.c
// Check whether the page pointed to by an address is a lazy copy page
int uvmcheckcowpage(uint64 va) {<!-- -->
  pte_t *pte;
  struct proc *p = myproc();
  
  return va < p->sz // within the process memory range
     & amp; & amp; ((pte = walk(p->pagetable, va, 0))!=0)
     & amp; & amp; (*pte & amp; PTE_V) // Page table entry exists
     & amp; & amp; (*pte & amp; PTE_COW); // The page is a lazy copy page
}

//Copy a lazy copy page and remap it to be writable
int uvmcowcopy(uint64 va) {<!-- -->
  pte_t *pte;
  struct proc *p = myproc();

  if((pte = walk(p->pagetable, va, 0)) == 0)
    panic("uvmcowcopy: walk");
  
  //Call the kcopy_n_deref method in kalloc.c to copy the page
  // (If the reference of the lazy copy page is already 1, there is no need to reallocate and copy the memory page, just clear the PTE_COW mark and mark PTE_W)
  uint64 pa = PTE2PA(*pte);
  uint64 new = (uint64)kcopy_n_deref((void*)pa); // Turn a lazy copied page reference into a real copied page
  if(new == 0)
    return -1;
  
  //Remap to writable and clear PTE_COW flag
  uint64 flags = (PTE_FLAGS(*pte) | PTE_W) & amp; ~PTE_COW;
  uvmunmap(p->pagetable, PGROUNDDOWN(va), 1, 0);
  if(mappages(p->pagetable, va, 1, new, flags) == -1) {<!-- -->
    panic("uvmcowcopy: mappages");
  }
  return 0;
}

When forking, the data is not copied and only the mapping + mark is created. When the process tries to write, the real copy is made and remapped to be writable.

Next, you need to do page life cycle management to ensure that a page is released only when no process is using it.

Page life cycle management

Physical page life cycle and reference counting

In the original xv6 implementation, the following operations can be supported during the life cycle of a physical page:

  • kalloc(): allocate physical pages
  • kfree(): Release and reclaim physical pages

After lazy allocation

  • kalloc(): allocate a physical page and set its reference count to 1
  • krefpage(): Create a new reference to the physical page and increment the reference count by 1
  • kcopy_n_deref(): Copy a reference of the physical page to a new physical page (reference count is 1), return the obtained copy page; and decrement the reference count of this physical page
    1
  • kfree(): Releases a reference to the physical page, and the reference count is decremented by 1; if the count becomes 0, the physical page is released and recycled

Here we first define an array pageref[] and corresponding macros, which are used to record and obtain the reference count of a physical page.

// kernel/kalloc.c

// Used to access the physical page reference count array
#define PA2PGREF_ID(p) (((p)-KERNBASE)/PGSIZE)
#define PGREF_MAX_ENTRIES PA2PGREF_ID(PHYSTOP)

struct spinlock pgreflock; // Lock for pageref array to prevent memory leaks caused by race conditions
int pageref[PGREF_MAX_ENTRIES]; // Reference count for each physical page starting from KERNBASE to PHYSTOP
// note: reference counts are incremented on fork, not on mapping. this means that
// multiple mappings of the same physical page within a single process are only
// counted as one reference.
// this shouldn't be a problem, though. as there's no way for a user program to map
// a physical page twice within it's address space in xv6.

// Get the reference count through the physical address
#define PA2PGREF(p) pageref[PA2PGREF_ID((uint64)(p))]

void
kinit()
{<!-- -->
  initlock( & amp;kmem.lock, "kmem");
  initlock( & amp;pgreflock, "pgref"); // Initialize lock
  freerange(end, (void*)PHYSTOP);
}

void
kfree(void *pa)
{<!-- -->
  struct run *r;

  if(((uint64)pa % PGSIZE) != 0 || (char*)pa < end || (uint64)pa >= PHYSTOP)
    panic("kfree");

  acquire(&pgreflock);
  if(--PA2PGREF(pa) <= 0) {<!-- -->
    //When the reference count of the page is less than or equal to 0, release the page

    // Fill with junk to catch dangling refs.
    // pa will be memset multiple times if race-condition occurred.
    memset(pa, 1, PGSIZE);

    r = (struct run*)pa;

    acquire(&kmem.lock);
    r->next = kmem.freelist;
    kmem.freelist = r;
    release(&kmem.lock);
  }
  release(&pgreflock);
}

void *
kalloc(void)
{<!-- -->
  struct run *r;

  acquire(&kmem.lock);
  r = kmem.freelist;
  if(r)
    kmem.freelist = r->next;
  release(&kmem.lock);

  if(r){<!-- -->
    memset((char*)r, 5, PGSIZE); // fill with junk
    // The reference count of the newly allocated physical page is 1
    // (No need to lock here)
    PA2PGREF(r) = 1;
  }
  
  return (void*)r;
}

// Decrease reference to the page by one if it's more than one, then
// allocate a new physical page and copy the page into it.
// (Effectively turing one reference into one copy.)
//
// Do nothing and simply return pa when reference count is already
// less than or equal to 1.
//
// When the reference is already less than or equal to 1, it does not create and copy to a new physical page, but directly returns the page itself.
void *kcopy_n_deref(void *pa) {<!-- -->
  acquire(&pgreflock);

  if(PA2PGREF(pa) <= 1) {<!-- --> // Only 1 reference, no need to copy
    release(&pgreflock);
    return pa;
  }

  // Allocate a new memory page and copy the data in the old page to the new page
  uint64 newpa = (uint64)kalloc();
  if(newpa == 0) {<!-- -->
    release(&pgreflock);
    return 0; //out of memory
  }
  memmove((void*)newpa, (void*)pa, PGSIZE);

  // Decrement the reference of the old page by 1
  PA2PGREF(pa)--;

  release(&pgreflock);
  return (void*)newpa;
}

// Increment the reference count of pa by 1
void krefpage(void *pa) {<!-- -->
  acquire(&pgreflock);
  PA2PGREF(pa) + + ;
  release(&pgreflock);
}

The spin lock pgreflock is defined for the pageref[] array, and in other operations except kalloc, acquire( & amp;pgreflock); and release( & amp;pgreflock); are used to acquire and release locks to protect the operation code .

Relevant copy scenes

Parent process: allocate physical page p (p reference count = 1)
Parent process: fork() (p reference count = 2)
Parent process: Try to modify p, triggering page exception
Parent process: Since p reference count is greater than 1, start real copying p (p reference count = 2)
--- Scheduler switches to child process
Child process: exec() replaces the process image and releases all old pages
Child process: attempts to release p (reference count decremented by 1), child process drops reference to p (p reference count = 1)
--- Scheduler switches to parent process
Parent process: (Continue to copy p) Create new page q, copy p to q, mark q as writable and establish mapping, in the process the parent process discards the reference to the old p

Execute tests

./grade-lab-cow usertest
== Test running cowtest == (6.8s)

== Test usertests == Timeout! (304.4s)
    ...
         test bigfile: OK
         test dirfile: OK
         test iref: OK
         test forktest: OK
         test bigdir: qemu-system-riscv64: terminating on signal 15 from pid 120203 (make)
    MISSING '^ALL TESTS PASSED$'
    QEMU output saved to xv6.out.usertests
== Test usertests: copyin ==
  usertests: copyin: FAIL
    Parent failed: test_usertests
== Test usertests: copyout ==
  usertests: copyout: FAIL
    Parent failed: test_usertests
== Test usertests: all tests ==
  usertests: all tests: FAIL
    Parent failed: test_usertests

I read a lot of blogs, but I don’t know why the error is reported here. Let’s continue working on it. The total score is 81/110.