Writing a Simple Garbage Collector in C

Article directory

    • Make malloc
    • Mark and scan
      • Scan the heap
      • Scan a continuous area
        • Find data segment
        • Find the bottom of the call stack
      • Integrate


Writing a simple garbage collector in C (maplant.com)

Make malloc

Header describes memory block

typedef struct header {<!-- -->
    unsigned int size;
    struct header *next;
} header_t;

Dynamically allocated memory is located in the so-called heap, which is a section of memory between the stack and the BSS (uninitialized data segment-all global variables with a default value of zero). The heap starts at a low address adjacent to the BSS and ends at a program breakpoint, which is somewhere between the BSS and the stack.

The program break point is the end of the program data segment. (The program break point is the starting position of the initialized data segment)

Attempting to access any memory between the stack and the breakpoint will result in an access violation (unless the memory you access is within the range that the stack can extend, but that’s a completely different topic).

To get more memory from the kernel, we simply extend the breakpoint, allowing us to access more memory. To do this, we call Unix’s sbrk system call, which expands the break by arguments and returns the address of the previous break on success, giving the program more memory. On failure, sbrk returns -1 which is cast to a null pointer, which is a bad convention that no one likes.

  1. Stack area (stack): automatically allocated and released by the compiler, storing function parameter values, local variable values, etc. It operates like a stack in a data structure.
  2. Heap area (heap): Generally allocated and released by the programmer. If the programmer does not release it, it may be recycled by the OS when the program ends. Note that it is different from the heap in the data structure, and the allocation method is similar to a linked list.

When an executable program is stored (not loaded into memory and run), it has at least three parts, namely the code segment (text), the data segment (data), and the BSS segment. When the application is running (running state), two additional domains are needed: heap and stack. Running program: Code segment + data segment + BSS segment + heap + stack.

static header_t base; /* Start with zero size block. */
static header_t *freep = & amp;base; /* Points to the first free memory block. ? */
static header_t *usedp; /*Points to the first used memory block. */

/*
 * Scan the free list and look for places to place blocks. Basically, what we are looking for is any block that may have been partitioned by the block that is about to be freed.
 */
static void
add_to_free_list(header_t *bp)
{<!-- -->
    header_t *p;

    //Find whether bp is within the freep range
    //for loop judgment condition: p < bp < p->next in the middle
    //if condition: bp < p->next < p < bp on both sides (freep is circular?)
    for (p = freep; !(bp > p & amp; & amp; bp < p->next); p = p->next)
        if (p >= p->next & amp; & amp; (bp > p || bp < p->next))
            break;
//Meet one of the two conditions...
    //If bp and p->next are adjacent (bp < p->next):
    if (bp + bp->size == p->next) {<!-- -->
        //The header information of the two is merged (covered)
        bp->size + = p->next->size;
        bp->next = p->next->next;
    } else
        //If not adjacent, assign next
        bp->next = p->next;

    //If p and bp are adjacent (p < bp):
    if (p + p->size == bp) {<!-- -->
        //merge
        p->size + = bp->size;
        p->next = bp->next;
    } else
        p->next = bp;
//freep may be circular
    freep = p;
}

In the sbrk() parameter function: when increment is a positive value, the discontinuity point position is moved backward by increment bytes. At the same time, it returns to the position before the move, which is equivalent to allocating memory. When increment is a negative value, the position moves forward by increment bytes, which is equivalent to releasing memory, and its return value has no practical meaning. When increment is 0, the position is not moved and only the current position is returned. The sign of the increment parameter determines whether memory is allocated or reclaimed.

img

#define MIN_ALLOC_SIZE 4096 /* We allocate blocks in page sized chunks. */

/*
 * Request more memory from the kernel.
 */
static header_t *
morecore(size_t num_units)
{<!-- -->
    void *vp;
    header_t *up;

    if (num_units > MIN_ALLOC_SIZE)
        num_units = MIN_ALLOC_SIZE / sizeof(header_t);

    //First divide and then multiply sizeof(header_t) to convert num_units into an integer multiple of MIN_ALLOC_SIZE
    //vp is the head pointer of the newly applied memory
    if ((vp = sbrk(num_units * sizeof(header_t))) == (void *) -1)
        return NULL;

    //Force converted to header_t to store header information
    up = (header_t *) vp;
    up->size = num_units;
    //Add the newly applied memory to freep
    add_to_free_list (up);
    return freep;
}

Now that we have two helper functions, writing the malloc function is very simple. We simply scan the free list and use the first block that is at least as big as the block we are looking for. Because we use the first block we find instead of trying to find a “better” block, this algorithm is called First Fit.

A quick note: the size field in the header structure is measured in header-sized chunks, not bytes.

/*
 * Find a chunk from the free list and put it in the used list.
 */
void *
GC_malloc(size_t alloc_size)
{<!-- -->
    size_t num_units;
    header_t *p, *prevp;

    //Here alloc_size + sizeof(header_t) is the entire memory block structure (bytes)
    //The reason for -1 is to make the dividend not divisible by sizeof(header_t), so + 1, get the size of the rounded-up memory block in units of header_t size
    num_units = (alloc_size + sizeof(header_t) - 1) / sizeof(header_t) + 1;
    prevp = freep;

    for (p = prevp->next;; prevp = p, p = p->next) {<!-- -->
        if (p->size >= num_units) {<!-- --> /* Big enough. */
            if (p->size == num_units) /* Exact size. */
                prevp->next = p->next;
            else {<!-- -->
                //The free memory block size is too large
                p->size -= num_units;//Memory block: small | remaining | allocated | large
                p + = p->size;//p moves to the head of the allocated memory
                p->size = num_units;//Set size
            }

            //preve saves the header information of the remaining memory blocks
            freep = prevp;

            /* Add to p to the used list. */
            if (usedp == NULL)
                usedp = p->next = p;
            else {<!-- -->
                //Insert usedp chain
                p->next = usedp->next;
                usedp->next = p;
            }

            //p + 1 is to skip sizeof(header_t) bytes and directly obtain the address of the effective memory
            return (void *) (p + 1);
        }
        if (p == freep) {<!-- --> /* Not enough memory. */
            p = morecore(num_units);//Request more memory from the kernel
            if (p == NULL) /* Request for more memory failed. */
                return NULL;
        }
        //It has not been allocated yet, but morecore() has been called and the memory has been successfully obtained, so loop to see if the applied memory is enough to allocate.
    }
}

Mark and scan

Traverse the memory of a range, and then

//Unlock the mark. Here, the lower two bits are set to 0, but the mark bit only has the last bit?
#define UNTAG(p) (((unsigned int) (p)) & amp; 0xfffffffc)

/*
 * Scan memory areas and mark any items appropriately in the used list.
 * Both parameters should be aligned.
 */
static void
scan_region(unsigned int *sp, unsigned int *end)
{<!-- -->
    header_t *bp;

    for (; sp < end; sp ++ ) {<!-- -->
        unsigned int v = *sp;
        bp = usedp;
        do {<!-- -->
            //v is within the effective memory address of bp
            if (bp + 1 <= v & amp; & amp;
                bp + 1 + bp->size > v) {<!-- -->
                //Set the first position of bp->next to 1 to mark it
                bp->next = ((unsigned int) bp->next) | 1;
                break;
            }
            //Loop until usedp is encountered again, so maybe usedp is also looping
        } while ((bp = UNTAG(bp->next)) != usedp);
    }
}

Now we can scan memory areas, but which memory areas should we browse? There are several relevant areas:

  • BSS (Uninitialized Data) and Initialized Data Segment: They contain all global and static variables in the program. Therefore, they can reference something in the heap.
  • Used blocks: Of course, if the user allocates a pointer to another allocated block, we don’t want to free the pointed block.
  • Stack: Since the stack contains all local variables, it is arguably the most important place.

Scan heap

/*
 * Scan the marked blocks for references to other unmarked blocks.
 */
static void
scan_heap(void)
{<!-- -->
    unsigned int *vp;
    header_t *bp, *up;

    for (bp = UNTAG(usedp->next); bp != usedp; bp = UNTAG(bp->next)) {<!-- -->
        if (!((unsigned int)bp->next & amp; 1))
            //Whether there is a mark, if not, continue
            continue;
        for (vp = (unsigned int *)(bp + 1);
             vp < (bp + bp->size + 1);
             vp + + ) {<!-- -->
            unsigned int v = *vp;
            up = UNTAG(bp->next);
            do {<!-- -->
                if (up != bp & amp; & amp;
                    up + 1 <= v & amp; & amp;
                    up + 1 + up->size > v) {<!-- -->
                    //v is within the effective memory address of up
                    up->next = ((unsigned int) up->next) | 1;
                    break;
                }
            } while ((up = UNTAG(up->next)) != bp);
        }
    }
}

Scan continuous area

Unlike the heap, the BSS and initialized data segments as well as the stack are in memory and may contain addresses in the heap. Because each is contiguous, in order to scan them we need to know the minimum and maximum valid memory addresses of each.

Find data segment

The initialization data segment is adjacent to the BSS

Most modern Unix linkers export two symbols that are accessible to user programs. These symbols we are interested in are:

  • etext: The address of etext is the last address after the text segment. The initialized data segment follows the text segment, so the address of etext is the initialized data segment.
  • end: The address of end is the beginning of the heap, or the last address after the end of the BSS.

Since there is no segment between the BSS and the initialization segment, we don’t have to deal with them as separate entities, they can be scanned by iterating from &etext to &end.

You can get it directly by using extern char end, etext;

Find the bottom of the call stack

The top of the stack can be obtained by fetching the esp register value through inline assembly. Bottom of Stack We will take advantage of the fact that Linux places the bottom of the stack in a string in the file of the process entry in the proc directory

Integration

/*
 * Find the absolute bottom of the stack and set stuff up.
 */
void
GC_init(void)
{<!-- -->
    static int initted;
    FILE *statfp;

    if (initted) return;

    initted = 1;

    //Get the bottom of the stack through the file
    statfp = fopen("/proc/self/stat", "r");
    assert(statfp != NULL);
    fscanf(statfp,
           "%*d %*s %*c %*d %*d %*d %*d %*d %*u "
           "%*lu %*lu %*lu %*lu %*lu %*lu %*ld %*ld "
           "%*ld %*ld %*ld %*ld %*llu %*lu %*ld "
           "%*lu %*lu %*lu %lu", & amp;stack_bottom);//Get the 28th data, so the first 27 are ignored ("%*lu")
    fclose(statfp);

    usedp = NULL;
    base.next = freep = & amp;base;
    base.size = 0;
}
/*
 * Mark used memory blocks and release unused memory blocks
 */
void
GC_collect(void)
{<!-- -->
    header_t *p, *prevp, *tp;
    unsigned long stack_top;
    extern char end, etext; /* Provided by the linker. */

    if (usedp == NULL)
        return;

    /* Scan BSS and initialized data segments. */
    scan_region( & amp;etext, & amp;end);

    /* Scan the stack. */
    asm volatile ("movl %?p, %0" : "=r" (stack_top));
    scan_region(stack_top, stack_bottom);

    /* Mark heap. */
    scan_heap();

    /* And now we collect! */
    for (prevp = usedp, p = UNTAG(usedp->next);; prevp = p, p = UNTAG(p->next)) {<!-- -->
    next_chunk:
        if (!((unsigned int)p->next & amp; 1)) {<!-- -->
            /*
             * This part is not marked yet. Therefore, it must be released
             */
            tp = p;
            p = UNTAG(p->next);
            add_to_free_list(tp);

            if (usedp == tp) {<!-- -->
                usedp = NULL;
                break;
            }

            //First do ((unsigned int) prevp->next & amp; 1, take out the flag bit of prevp->next, and then add it to p
            prevp->next = (unsigned int)p | ((unsigned int) prevp->next & amp; 1);
            goto next_chunk;
        }
        //Clear the p->next flag bit
        p->next = ((unsigned int) p->next) & amp; ~1;
        if (p == usedp)
            break;
    }
}