Handwritten database wheel project MYDB Part 5 | DataManager (DM) Data page indexing and upward management DataItem

This chapter introduces a simple page index and implements the DM layer’s abstraction of the upper layer: DataItem.

1. Page index

Page index, which caches the free space of each page. It is used to quickly find a page with suitable space when the upper module performs an insertion operation without checking the information of each page from the disk or cache.

1. PageIndex

PageIndex divides the space of a page into 40 intervals. At startup, all page information will be traversed, the free space of the page will be obtained, and arranged into these 40 intervals. When insert requests a page, it will first round up the required space and map it to a certain range, and then take out any page in this range to meet the demand.

 // Divide a page into 40 intervals
    private static final int INTERVALS_NO = 40;
    private static final int THRESHOLD = PageCache.PAGE_SIZE / INTERVALS_NO; /// The size of each interval

    private Lock lock;

    /// lists is an array. Each element in the array is an ArrayList list. Each list stores multiple PageInfo
    /// The i-th element in lists represents the collection of pages with i intervals left in free capacity.
    private List<PageInfo>[] lists;

PageInfo actually saves the page number of a certain page and the size of the free interval of the page.

public class PageInfo {
    public int pgno;
    public int freeSpace;

    public PageInfo(int pgno, int freeSpace) {
        this.pgno = pgno;
        this.freeSpace = freeSpace;
    }
}

2. Get the page through PageIndex

 public PageInfo select(int spaceSize) {
        lock.lock();
        try {
            int number = spaceSize / THRESHOLD; /// How many intervals are needed to store spaceSize size data?
            if(number < INTERVALS_NO) number + + ; /// Round up
            while(number <= INTERVALS_NO) {
                if(lists[number].size() == 0) { /// If there are number of free intervals, the number of pages is 0
                    number + + ; /// can only be changed to a larger range
                    continue;
                }
                return lists[number].remove(0); /// Take out any page at will, pay attention to use remove
            }
            return null;
        } finally {
            lock.unlock();
        }
    }

The selected page will be directly removed from the PageIndex, which means that concurrent writing to the same page is not allowed. After the upper module has finished using this page, it needs to be reinserted into the PageIndex.

 public void add(int pgno, int freeSpace) {
        lock.lock();
        try {
            int number = freeSpace / THRESHOLD; /// How many free spaces are there on this page?
            lists[number].add(new PageInfo(pgno, freeSpace));
        } finally {
            lock.unlock();
        }
    }

3. Fill PageIndex

When the DataManager is created, all pages need to be fetched and the PageIndex populated.

 void fillPageIndex() {
        int pageNumber = pc.getPageNumber();
        for(int i = 2; i <= pageNumber; i + + ) { /// Starting from 2 because PageOne is a special page and does not require index
            Page pg = null;
            try {
                pg = pc.getPage(i);
            } catch (Exception e) {
                Panic.panic(e);
            }
            pIndex.add(pg.getPageNumber(), PageX.getFreeSpace(pg));
            pg.release();
        }
    }
 public static int getFreeSpace(Page pg) {
        return PageCache.PAGE_SIZE - (int)getFSO(pg.getData());
    }

2. DataItem

DataItem is the data abstraction provided by the DM layer to the upper layer. The upper module requests the corresponding DataItem from DM through uid, and then obtains the data in it.

The data saved in DataItem is as follows:

[ValidFlag] [DataSize] [Data]

[ValidFlag] occupies 1 byte, 0 is legal, 1 is illegal. To delete a DataItem, you only need to set the flag bit to 1 (logical deletion).

[DataSize] occupies 2 bytes and identifies the length of [Data].

After the upper module obtains the DataItem, it can obtain the [data] part through the data() method. The array returned by this method is data shared, not implemented by copying, so SubArray is used.

public SubArray data() {
    return new SubArray(raw.raw, raw.start + OF_DATA, raw.end);
}

1. Modify DataItem

The before() method needs to be called before modification. When you want to undo the modification, call the unBefore() method. After the modification is completed, call the after() method. The entire process is mainly to save the previous phase data and log it in time. DM will ensure that modifications to DataItem are atomic.

 @Override
    public void before() {
        wLock.lock();
        pg.setDirty(true);
        /// Starting from the raw.start position in the array raw.raw, paste oldRaw.length elements starting from position 0 of the oldRaw array.
        System.arraycopy(raw.raw, raw.start, oldRaw, 0, oldRaw.length);
    }

    @Override
    public void unBefore() {
        /// Put oldRaw into raw.raw
        System.arraycopy(oldRaw, 0, raw.raw, raw.start, oldRaw.length);
        wLock.unlock();
    }

    @Override
    public void after(long xid) {
        dm.logDataItem(xid, this);
        wLock.unlock();
    }
 public void logDataItem(long xid, DataItem di) {
        byte[] log = Recover.updateLog(xid, di); /// Generate [Data] in [Log]
        logger.log(log); /// Wrap [data] into [log] format and write it to disk
        /// Note that interface polymorphism is used here. Logger logger is an interface, but the implementation class object is passed during assignment. When the log method is called, the overridden method in the implementation class will be called.
    }

After using the DataItem, you should also call the release() method in time to release the cache of the DataItem (the DM caches the DataItem).

@Override
public void release() {
    dm.releaseDataItem(this);
}

3. DM layer implementation

DataManager is a class that the DM layer directly provides external methods. At the same time, it also implements the caching of DataItem objects. The key stored in the DataItem in the cache, i.e. uid, is an 8-byte unsigned integer consisting of the page number and the offset within the page. The page number and offset each occupy 4 bytes.

1. Get DataItem

To cache DataItem, getForCache() only needs to parse the page number from uid, obtain the page Page from pageCache, and then parse the DataItem according to the offset.

 @Override
    protected DataItem getForCache(long uid) throws Exception {
        /// Assign the lower 16 bits of uid to offset
        short offset = (short)(uid & amp; ((1L << 16) - 1));
        /// uid is shifted right by 32 bits, that is, only the high 32 bits are retained
        uid >>>= 32;
        /// Assign the high 32 bits of the original uid to pgno
        int pgno = (int)(uid & amp; ((1L << 32) - 1));
        /// Get the page from the page cache
        Page pg = pc.getPage(pgno);
        /// Wrap data into DataItemImpl objects
        return DataItem.parseDataItem(pg, offset, this);
    }

2. Create DM implementation class

Create DataManager from empty files and existing files.

 public static DataManager create(String path, long mem, TransactionManager tm) {
        PageCache pc = PageCache.create(path, mem);
        Logger lg = Logger.create(path);
 
        DataManagerImpl dm = new DataManagerImpl(pc, lg, tm);
        /// Initialize PageOne
        dm.initPageOne();
        return dm;
    }
    public static DataManager open(String path, long mem, TransactionManager tm) {
        PageCache pc = PageCache.open(path, mem);
        Logger lg = Logger.open(path);
        DataManagerImpl dm = new DataManagerImpl(pc, lg, tm);
        /// Verify PageOne
        /// Read PageOne when opening an existing file and verify the correctness
        if(!dm.loadCheckPageOne()) {
            Recover.recover(tm, lg, pc);
        }
        /// Fill in the page number of each page and the remaining space of the page
        dm.fillPageIndex();
        PageOne.setVcOpen(dm.pageOne);
        dm.pc.flushPage(dm.pageOne);
        return dm;
    }

Among them, initializing the first page and verifying the first page are basically implemented by calling methods in the PageOne class.

//Initialize PageOne when creating a file
void initPageOne() {
    int pgno = pc.newPage(PageOne.InitRaw());
    assert pgno == 1;
    try {
        pageOne = pc.getPage(pgno);
    } catch (Exception e) {
        Panic.panic(e);
    }
    pc.flushPage(pageOne);
}

// Read PageOne when opening an existing file and verify the correctness
boolean loadCheckPageOne() {
    try {
        pageOne = pc.getPage(1);
    } catch (Exception e) {
        Panic.panic(e);
    }
    return PageOne.checkVc(pageOne);
}

3. DataItem addition and modification query

The DM layer provides three functions for the upper layer to use, namely reading, inserting and modifying. The modification is implemented through the read DataItem, so the DataManager only needs to provide the read() and insert() methods.

 public DataItem read(long uid) throws Exception {
        /// Call the parent class method to get the dataitem from the cache
        DataItemImpl di = (DataItemImpl)super.get(uid);
        if(!di.isValid()) { /// If not valid
            di.release(); /// Release the di
            return null;
        }
        return di;
    }
 public long insert(long xid, byte[] data) throws Exception {
        /// The format of multiplying data package by dataitem [valid][size][data]
        byte[] raw = DataItem.wrapDataItemRaw(data);
        if(raw.length > PageX.MAX_FREE_SPACE) {
            throw Error.DataTooLargeException;
        }

        PageInfo pi = null;
        for(int i = 0; i < 5; i + + ) {
            /// Try to get a page with enough free space
            pi = pIndex.select(raw.length);
            if (pi != null) {
                break;
            } else { /// If there is no page with enough free space, create a new data page and put it in the cache
                int newPgno = pc.newPage(PageX.initRaw());
                pIndex.add(newPgno, PageX.MAX_FREE_SPACE);
            }
        }
        if(pi == null) {
            throw Error.DatabaseBusyException;
        }

        Page pg = null;
        int freeSpace = 0;
        try {
            pg = pc.getPage(pi.pgno);
            /// Generate [Data] in [Log]
            byte[] log = Recover.insertLog(xid, pg, raw);
            /// Wrap [Data] into [Log] format and write to disk
            logger.log(log);
            /// Insert [Data] into the data page
            short offset = PageX.insert(pg, raw);
            /// Release cache
            pg.release();
            /// Convert pngo and offset to uid
            return Types.addressToUid(pi.pgno, offset);

        } finally {
            // Re-insert the removed pg into pIndex
            if(pg != null) {
                pIndex.add(pi.pgno, PageX.getFreeSpace(pg));
            } else {
                pIndex.add(pi.pgno, freeSpace);
            }
        }
    }

4. Summary of DM module

1. AbstractCache

The AbstractCache abstract class is a reference counting caching framework.

Three HashMap are maintained: cache (actual cached data), references (number of references to cached data), getting (whether the data is being obtained by the thread).

Two methods are provided: get(key) (get the data from the cache, if the cache does not have it, call getForCache(key) to get it from other places), release(key) (release the reference, when the reference is 0, call releaseForCach(key ) writes the resource back to the hard drive).

Two methods are retained: getForCache(key) and releaseForCach(key), which are left to the specific users of the cache to implement.

2. Page

The data structure of the PageImpl data page, including the page number of this page, the specific content stored, whether it is a dirty page, and the PageCache cache where this page is located.

The first page of the PageOne data page is a verification page. It is used to determine whether the random bytes stored in 100-107 bytes when the database is started are copied to 108-115 bytes when it is closed. Whether it is closed normally.

PageX Manager for other common data pages. The specific content stored in a normal page is [FreeSpaceOffset] [Data]. FreeSpaceOffset stores the starting offset of the free position in the page. Functions include obtaining update FSO, inserting update data into ordinary pages, etc.

3. PageCacheImpl

The implementation class of data page cache inherits AbstractCache, rewrites the getForCache method, reads data from the database file according to pageNumber, wraps it into Page, and rewrites the releaseForCache method. Get the total number of data pages in the database, getPageNumber(). Create a new data page and write it to the database file, newPage(byte[] initData). Get the specified data page from the cache, getPage(int pgno). Delete the data page behind the specified location, truncateByBgno(int maxPgno) and other methods.

4. PageIndex

The PageInfo class saves the page number of a page and its remaining space.

The PageIndex class is used to quickly locate pages with enough free space when inserting data. It logically divides a data page into 40 intervals. It maintains an array, each element in the array is an ArrayList, and the ArryList stores PageInfo objects. The element i in the table below in the array represents the set of pages with i intervals left in free capacity.

5. DataItemImpl

[Data] after [FreeSpaceOffset] in a normal data page is actually a stored DataItem. The structure of DataItem in the data page is [ValidFlag] [DataSize] [Data]. Of course, to wrap the content in the data page as DataItemImpl, there are 5 member variables. The first one is of SubArray type, which records all [Data] in a data page and the starting and ending index of this DataItem. The second is oldRaw, which is the same length as the DataItem, the third is the data page, the fourth is the uid calculated from the pageNumber and the starting index of the DataItem, and the fifth is DataMangerImpl.

6. LoggerImpl

The format of the log file is [XChecksum] [Log1] [Log2] … [LogN] [BadTail], where the format of each [Log] is [Size] [Checksum] [Data], and the [Data] is based on Update log and insert log have two different structures. The log structure of the update operation is Bytes.concat(logType, xidRaw, uidRaw, oldRaw, newRaw), and the log structure of the insert operation is Bytes.concat(logTypeRaw, xidRaw, pgnoRaw, offsetRaw, raw).

7. Recover

The log recovery strategy mainly maintains two logs: updateLog and insertLog. Redo and undo are performed based on the log records to modify the data pages.

8. DataMangerImpl

Inherits AbstractCache, performs cache management on DataItemImpl, reads and writes DataItem data pages, etc.