The k project

KFS

The K File System is inspired by modern UNIX file system. Each file is represented by an inode, and the data is split in blocks. It’s a flat file system, which means that there is no directory.

The Block Unit

Each structure is referenced in the file system by a block offset. The superblock is the first block of the file system (offset 0). Blocks are 4096 bytes long. You will find the first i-node offset in the superblock.

The K File System superblock

As you understood, the superblock is the entry point of the filesystem. The structure of the superblock is defined as follows:

magic number              (4 unsigned bytes) = 0xd35f9caa
file system name          (32 bytes)
creation timestamp        (4 bytes)
blocks count              (4 unsigned bytes)
i-nodes count             (4 unsigned bytes)
first i-node block index  (4 unsigned bytes)
checksum                  (4 unsigned bytes) = Adler32 of the first six fields

The I-Node

The I-Node is the structure describing a file representation. Our structure borrows a few concepts of the traditional UNIX i-node structure and more particularly direct block pointers and single indirect block pointers. This is the structure:

i-number                (4 unsigned bytes)
filename                (32 bytes)
file size               (4 unsigned bytes)
i-node block index      (4 unsigned bytes)
blocks count            (4 unsigned bytes)
next i-node block index (4 unsigned bytes)
direct blocks count     (4 unsigned bytes)
indirect blocks count   (4 unsigned bytes)
direct blocks indexes   (10 * 4 unsigned bytes)
indirect blocks indexes (16 * 4 unsigned bytes)
checksum                (4 unsigned bytes) = Adler32 of the first ten fields

The Indirect Block

This structure references data blocks and is referenced in the i-node. It is designed as follows:

indirect block index    (4 unsigned bytes)
blocks count            (4 unsigned bytes)
data blocks indexes     (16 * 4 unsigned bytes)
checksum                (4 unsigned bytes) = Adler32 of the first three fields

The Data Block

This is the structure containing part of the file data. The checksum is not calculated as you saw previously. In order to do it, load the data block from the file system, keep a copy of the checksum field, set the structure’s checksum field to 0. Finally apply the checksum function to the whole structure (including checksum field you just setted to 0) and compare it with the checksum you loaded before.

Here is the data block structure:

data block index        (4 unsigned bytes)
block usage             (4 unsigned bytes)
checksum                (4 unsigned bytes) - See below
data                    (4096 - 3 * 4 bytes)

Checksumming

Every structure of KFS is checksummed. We chose the Adler32 algorithm which is a quite good compromise between efficiency and ease of implementation.

An Adler-32 checksum is obtained by calculating two 16-bit checksums A and B and concatenating their bits into a 32-bit integer. A is the sum of all bytes in the string plus one, and B is the sum of the individual values of A from each step.

At the beginning of an Adler-32 run, A is initialized to 1, B to 0. The sums are done modulo 65521 (the largest prime number smaller than 2^16). The bytes are stored in network order (big endian), B occupying the two most significant bytes.

The Adler-32 sum of the ASCII string “Wikipedia” would be calculated as follows:

ASCII code          A                   B

W: 87           1 +  87 =  88        0 +  88 =   88
i: 105         88 + 105 = 193       88 + 193 =  281
k: 107        193 + 107 = 300      281 + 300 =  581
i: 105        300 + 105 = 405      581 + 405 =  986
p: 112        405 + 112 = 517      986 + 517 = 1503
e: 101        517 + 101 = 618     1503 + 618 = 2121
d: 100        618 + 100 = 718     2121 + 718 = 2839
i: 105        718 + 105 = 823     2839 + 823 = 3662
a: 97         823 +  97 = 920     3662 + 920 = 4582

A      = 920  = 0x398
B      = 4582 = 0x11E6
Output = 0x11E60398