文件系统驱动Part1Part2

104 阅读 0 评论 69 点赞

我是靠谱客的博主时尚热狗，这篇文章主要介绍文件系统驱动Part1Part2，现在分享给大家，希望可以做个参考。

目标

掌握虚拟文件系统的知识,了解索引节点,‘dentry’,文件,超级块,数据块的概念
了解在VFS中挂载文件的过程
了解不同类型的文件,并且了解文件系统在有磁盘的时候和没有磁盘支持的时候的区别

Part1

虚拟文件系统

虚拟文件系统也被称为VFS,是linux内核中用于处理各种系统调用的组件,这些系统调用依赖于各种文件和文件系统.VFS是用户和特定文件系统之间的接口.这种抽象简化了文件系统的实现,并且以一种更加简洁的方式将多种文件系统整合在一起.也就是说,文件系统的实现就是完成VFS提供的API的过程,普通的硬件和IO子系统也是被VFS处理的.

从功能的视角来考虑,文件系统可以被分组成如下的形式:

磁盘文件系统(ext3,ext4,xfs,fat,ntfs,等等.)
网络文件系统(nfs,smbfs/cifs,ncp,等等.)
虚拟文件系统(procfs,sysfs,sockfs,pipefs,等等)

linux内核实例将会使用VFS来处理目录树和文件.新的文件系统可以通过使用mount操作的方式来添加到VFS子树当中.一个文件系统通常挂载自它所发挥作用的环境(从块设备,从network等).可是,VFS可以使用一个普通的文件作为一个虚拟的块设备,所以可以在一个普通的文件上挂载一个磁盘文件.也就是说,文件系统栈可以被创建.

文件系统的基本思想是提供一个可以表示来自任何文件系统的文件的单独的文件模型.文件系统驱动可以被当做是共同的分母.从而一个文件系统是根,其余的文件系统被挂载到这个根的各个目录当中.

普遍的文件系统模式

普通的文件系统模型,任何已实现的文件系统都需要被缩减为该模型,这种模型由几个很好定义了的实体组成:超级块,索引节点,文件和dentry.这些实体是文件系统元数据.(它们包含数据信息和其他的元数据信息).

模型实体使用一些VFS或者内核子系统来发挥作用:dentry cache,inode cache,buffer cache.每个实体被当做是一个对象:它有相关联的数据结构和指向方法表的指针.通过代替相关的methods,可以为每个组件引入特殊的行为.

超级块

超级块存储的信息可以被用来挂载文件系统.

索引节点和块的定位
文件系统块大小
最大文件名长度
文件大小的最大值
根索引节点的定位

本地化

在磁盘文件系统中,磁盘和第一个块和超级块紧密关联.(文件系统控制块)
在VFS中,所有文件系统的超级块在一个结构的列表中,这个结构如:struct super_block和这个结构中struct super_operations类型的方法.

inode

索引节点保留文件的信息,这些文件如:普通的文件,目录,特殊的文件(pipe,fifo),块设备,字符设备,link,或者任何可以被抽象为文件的事物.
索引节点存储的信息如下:

文件类型
文件大小
访问权限
访问或者修改时间
在磁盘中数据的定位(指向包含数据的磁盘块的指针

本地化

就像是超级块,索引节点和磁盘有关联.磁盘中的索引节点通常被分组在了一个特殊的区域(索引节点区域,和数据块区域是分开存放的).在一些文件系统中,索引节点的等效分布在FAT文件系统中;作为虚拟文件系统实体,索引节点被struct inode和struct inode_operations中定义的操作代表.

每个索引节点通常被一个号码代表.在Linux中,ls的-i参数展示了每个文件的索引节点号.

razvan@valhalla:~/school/so2/wiki$ ls -i
1277956 lab10.wiki  1277962 lab9.wikibak  1277964 replace_lxr.sh
1277954 lab9.wiki   1277958 link.txt      1277955 homework.wiki

文件

File是文件系统中的一个和user亲近的组件.在内存当中,这个结构仅仅是作为VFS的一个实体存在并且在磁盘上没有物理对应.

当索引节点在磁盘中抽象除了文件的时候,file结构则抽象了一个打开的文件.从进程的角度来看,file实体抽象了文件.从实现的文件系统的角度出发,索引节点是抽象出了文件的那个实体.

file结构维护了下面的信息:

文件cursor的位置
文件的打开权限
指向相关联的索引节点的指针.

本地化

struct file结构和VFS实体相关联,struct file_operations代表了和file相关的操作.

dentry

dentry和一个拥有文件名的索引节点相关联.
通常,dentry包含两个字段:

一个定位inode的整数
一个代表了名字的字符串
dentry是path的特殊的部分,可以是文件,也可以是目录.例如对于/bin/vi,dentry实体可以被创建为/,bin,vi三个dentry对象.
dentry在磁盘中有对应物,但是该对应物并不直接,这是因为每种文件系统以它们特有的方式持有dentry.
在VFS中,dentry实体被struct dentry结构表示,与dentry相关的操作定义在struct dentry_operations结构中.

注册和取消注册文件系统

在当期的版本中,linux内核支持50中文件系统,包括

ext2/ ext4
reiserfs
xfs
fat
ntfs
iso9660
udf for CDs and DVDs
hpfs
但是在一个封闭的系统中,文件系统的数目不太可能多于5或6种.由于这个缘故,文件系统被作为模型实现,并且可以在任何时间加载到系统中或者从系统中卸载.

为了能够动态地加载和卸载文件系统模块,我们需要使用文件系统的registration/deregistration API.描述一个特定的文件系统的结构是struct file_system_type.

下面是一个注册虚拟文件系统的例子,for ramfs

#include <linux/fs.h>

struct file_system_type {
         const char *name;
         int fs_flags;
         struct dentry *(*mount) (struct file_system_type *, int,
                                   const char *, void *);
         void (*kill_sb) (struct super_block *);
         struct module *owner;
         struct file_system_type * next;
         struct hlist_head fs_supers;
         struct lock_class_key s_lock_key;
         struct lock_class_key s_umount_key;
         //...
};

name是代表名字的字符串,该名字可以identify一个文件系统(传入到mount -t中的参数,如mount -t nfs 192.168.43.92:/home/acat/aaa /home/root/hhh)
如果文件系统被作为模块实现,那么owner就是THIS_MODULE,如果文件系统被直接写入到了内核中,那么owner就是NULL.
当加载进内核的时候mount函数从内存中的磁盘中读取超级块.对于每个文件系统,这个函数是独一无二的
kill_sb函数从内存中释放超级块
fa_flags指定必须挂载文件系统的标志.例子:标志FS_REQUIRES_DEV指定VFS文件系统需要一个磁盘.
fs_supers是包含和该文件系统相关的所有超级块的列表.因为相同的文件系统可以被在不同的时间多次挂载,对于每次mount,超级块是分开的.

将文件系统注册进内核当中通常发生在模块初始化的时候.为了注册,程序不得不执行:
1.初始化struct file_system_type结构with name,flags,实现超级块读取操作的函数,对这个结构(识别当前模块)的引用.
2.调用register_filesystem()方法

当unload模块的时候,必须要调用unregister_filesystem()方法来unregister 文件系统.

When unloading the module, you must unregister the file system by calling the unregister_filesystem() function.

An example of registering a virtual file system is found in the code for ramfs:

static struct file_system_type ramfs_fs_type = {
        .name           = "ramfs",
        .mount          = ramfs_mount,
        .kill_sb        = ramfs_kill_sb,
        .fs_flags       = FS_USERNS_MOUNT,
};

static int __init init_ramfs_fs(void)
{
        if (test_and_set_bit(0, &once))
                return 0;
        return register_filesystem(&ramfs_fs_type);
}

mount，kill_sb方法

当挂载文件系统的时候,kernel调用定义在structure file_system_type中的mount方法.这个方法做出一系列的初始化并且返回一个dentry(struct dentry)代表目录的挂载点.通常mount()是一个简单的方法,调用下面的方法之一

mount_bdev(), 挂载一个存储在块设备上的文件系统.
mount_single(), 挂载一个文件系统,在所有的挂载操作之间共享一个实例.
mount_nodev(),不在物理设别上挂载一个文件系统.
mount_pseudo(), pseudo文件系统的帮助方法.(sockfs,pipefs,通常这些文件系统不可以被挂载)

这些方法将一个指向fill_super()函数的指针作为参数,fill_super方法在超级块被初始化之后被调用.可以在fill_super章节被发现.

当unmounting文件系统的时候,内核调用kill_sb()函数,这执行清理的工作,并调用下面的方法之一:

kill_block_super(),这将在快设备上卸载文件系统.
kill_anon_super(),卸载虚拟文件系统(在发起请求的时候生成了信息)
kill_litter_super(),这将卸载不在物理设备上的文件系统(信息保留在内存当中)

在没有磁盘支持的文件系统,是ramfs_mount方法

struct dentry *ramfs_mount(struct file_system_type *fs_type,
        int flags, const char *dev_name, void *data)
{
        return mount_nodev(fs_type, flags, data, ramfs_fill_super);
}

在minix问加你通中,来自磁盘的文件系统是mini_mount()方法

struct dentry *minix_mount(struct file_system_type *fs_type,
        int flags, const char *dev_name, void *data)
{
         return mount_bdev(fs_type, flags, dev_name, data, minix_fill_super);
}

VFS中的超级块

超级块不仅仅存在于物理实体中(磁盘中的实体),还存在于VFS实体中(strut super_block结构).超级块中只包含从磁盘中读取元数据的元信息.(索引节点,directoru entries).超级块包含块设备的信息,索引节点列表,指向文件系统根目录的索引节点的指针,指向超级块操作的指针.

struct super_block结构

struct super_block结构的部分定义如下:

struct super_block {
        //...
        dev_t                   s_dev;              /* identifier */
        unsigned char           s_blocksize_bits;   /* block size in bits */
        unsigned long           s_blocksize;        /* block size in bytes */
        unsigned char           s_dirt;             /* dirty flag */
        loff_t                  s_maxbytes;         /* max file size */
        struct file_system_type *s_type;            /* filesystem type */
        struct super_operations *s_op;              /* superblock methods */
        //...
        unsigned long           s_flags;            /* mount flags */
        unsigned long           s_magic;            /* filesystem’s magic number */
        struct dentry           *s_root;            /* directory mount point */
        //...
        char                    s_id[32];           /* informational name */
        void                    *s_fs_info;         /* filesystem private info */
};

超级块存储一个文件系统实例的全局信息

它所驻留的物理设备
块大小
文件的最大大小
文件系统类型
支持的操作
幻数(识别文件系统)
根目录dentry

另外,一个普通的指针存储文件系统的私有数据.超级块可以被当做是一个抽象的对象.

超级块操作

The superbloc operations are described by the struct super_operations structure:

struct super_operations {
       //...
       int (*write_inode) (struct inode *, struct writeback_control *wbc);
       struct inode *(*alloc_inode)(struct super_block *sb);
       void (*destroy_inode)(struct inode *);

       void (*put_super) (struct super_block *);
       int (*statfs) (struct dentry *, struct kstatfs *);
       int (*remount_fs) (struct super_block *, int *, char *);
       //...
};

这个结构的字段是函数指针

write_inode,alloc_inode,destory_inode write,allocate,独自释放和inode相关的资源.
put_super当超级块被释放的时候被调用;在这个方法内部,任何和文件系统私有数据相关的内存资源都被释放.
remount_fs is called when the kernel detects a remount attempt (mount flag MS_REMOUNTM); most of the time here must be detected if a switch from read-only to read-write or vice versa is attempted; this can be done simply because both the old flags (in sb->s_flags) and the new flags (the flags argument) can be accessed; data is a pointer to the data sent by mount() that represent file system specific options;
statfs is called when a statfs system call is done (try stat –f or df); this call must fill the fields of the struct kstatfs structure, as it is done, for example, in the ext4_statfs() function.

fill_super方法

fill_super()方法用于调用来终止超级块的初始化.这个初始化涉及到fill super_block结构的字段和初始化根目录索引节点.

一个例子就是ramfs_fill_super()方法的实现,被调用用来初始化超级块中剩余的字段.

#include <linux/pagemap.h>

#define RAMFS_MAGIC     0x858458f6

static const struct super_operations ramfs_ops = {
        .statfs         = simple_statfs,
        .drop_inode     = generic_delete_inode,
        .show_options   = ramfs_show_options,
};

static int ramfs_fill_super(struct super_block *sb, void *data, int silent)
{
        struct ramfs_fs_info *fsi;
        struct inode *inode;
        int err;

        save_mount_options(sb, data);

        fsi = kzalloc(sizeof(struct ramfs_fs_info), GFP_KERNEL);
        sb->s_fs_info = fsi;
        if (!fsi)
                return -ENOMEM;

        err = ramfs_parse_options(data, &fsi->mount_opts);
        if (err)
                return err;

        sb->s_maxbytes          = MAX_LFS_FILESIZE;
        sb->s_blocksize         = PAGE_SIZE;
        sb->s_blocksize_bits    = PAGE_SHIFT;
        sb->s_magic             = RAMFS_MAGIC;
        sb->s_op                = &ramfs_ops;
        sb->s_time_gran         = 1;

        inode = ramfs_get_inode(sb, NULL, S_IFDIR | fsi->mount_opts.mode, 0);
        sb->s_root = d_make_root(inode);
        if (!sb->s_root)
                return -ENOMEM;

        return 0;
}

内核提供实现文件系统结构的函数.generic_drop_inode和simple_statfs函数被用于实现驱动.

ramfs_fill_super()方法填充超级块中的某些字段,接着读取根索引节点信息并分配root
dentry.在ramfs_get_inode方法中读取根索引节点信息,使用new_inode()方法方法分配新的索引节点并初始化它.为了释放索引节点,可以使用iput(),d_make_root()方法被用于分配root dentry.

一个实现磁盘文件系统的例子是minix_fill_super().这个磁盘文件系统的功能和虚拟文件系统的功能是类似的,除了使用buffer cache是不同的.当然,minix文件系统使用struct minix_sb_info结构来保持私有的数据.这个方法主要是为了私有数据的初始化.使用kzalloc()方法分配私有数据并且存储在超级块结构的s_fs_info字段.

VFS的功能是将超级块作为参数,inode/dentry包含指向超级块的指针以至于可以处理这些私有数据.

BUffer cache

Buffer cache is a kernel subsystem that handles caching (both read and write) blocks from block devices. The base entity used by cache buffer is the struct buffer_head structure. The most important fields in this structure are:

b_data, pointer to a memory area where the data was read from or where the data must be written to
b_size, buffer size
b_bdev, the block device
b_blocknr, the number of block on the device that has been loaded or needs to be saved on the disk
b_state, the status of the buffer

There are some important functions that work with these structures:

__bread(): reads a block with the given number and given size in a buffer_head structure; in case of success returns a pointer to the buffer_head structure, otherwise it returns NULL;
sb_bread(): does the same thing as the previous function, but the size of the read block is taken from the superblock, as well as the device from which the read is done;
mark_buffer_dirty(): marks the buffer as dirty (sets the BH_Dirty bit); the buffer will be written to the disk at a later time (from time to time the bdflush kernel thread wakes up and writes the buffers to disk);
brelse(): frees up the memory used by the buffer, after it has previously written the buffer on disk if needed;
map_bh(): associates the buffer-head with the corresponding sector.

方法和有用的宏

The super block typically contains a map of occupied blocks (by inodes, dentries, data) in the form of a bitmap (vector of bits). To work with such maps, it is recommend to use the following features:

find_first_zero_bit(), to find the first zero bit in a memory area. The size parameter means the number of bits in the search area;
test_and_set_bit(), to set a bit and get the old value;
test_and_clear_bit(), to delete a bit and get the old value;
test_and_change_bit(), to invert the value of a bit and get the old value.

The following macrodefinitions can be used to verify the type of an inode:

S_ISDIR (inode->i_mode) to check if the inode is a directory;
S_ISREG (inode->i_mode) to check if the inode is a regular file (not a link or device file).

/*
 * SO2 Lab - Filesystem drivers
 * Exercise #1 (no-dev filesystem)
 */

#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/pagemap.h>

MODULE_DESCRIPTION("Simple no-dev filesystem");
MODULE_AUTHOR("SO2");
MODULE_LICENSE("GPL");

#define MYFS_BLOCKSIZE		4096
#define MYFS_BLOCKSIZE_BITS	12
#define MYFS_MAGIC		0xbeefcafe
#define LOG_LEVEL		KERN_ALERT

/* declarations of functions that are part of operation structures */

static int myfs_mknod(struct inode *dir,
		struct dentry *dentry, umode_t mode, dev_t dev);
static int myfs_create(struct inode *dir, struct dentry *dentry,
		umode_t mode, bool excl);
static int myfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);

/* TODO 2: define super_operations structure */
static const struct super_operations myfs_ops = {
    .statfs = simple_statfs,
    .drop_inode = generic_drop_inode,
};

static const struct inode_operations myfs_dir_inode_operations = {
	/* TODO 5: Fill dir inode operations structure. */
    .create = myfs_create,
    .lookup = simple_lookup,
    .link   = simple_link,
    .unlink = simple_unlink,
    .mkdir  = myfs_mkdir,
    .rmdir  = simple_rmdir,
    .mknod  = myfs_mknod,
    .rename = simple_rename,
};

static const struct file_operations myfs_file_operations = {
	/* TODO 6: Fill file operations structure. */
    .read_iter      = generic_file_read_iter,
    .write_iter     = generic_file_write_iter,
    .mmap           = generic_file_mmap,
    .llseek         = generic_file_llseek,
};

static const struct inode_operations myfs_file_inode_operations = {
	/* TODO 6: Fill file inode operations structure. */
    .getattr        = simple_getattr,
};

static const struct address_space_operations myfs_aops = {
	/* TODO 6: Fill address space operations structure. */
    .readpage       = simple_readpage,
    .write_begin    = simple_write_begin,
    .write_end      = simple_write_end,
};

struct inode *myfs_get_inode(struct super_block *sb, const struct inode *dir,
		int mode)
{
	struct inode *inode = new_inode(sb);

	if (!inode)
		return NULL;

	/* TODO 3: fill inode structure
	 *     - mode
	 *     - uid
	 *     - gid
	 *     - atime,ctime,mtime
	 *     - ino
	 */
    inode_init_owner(inode,dir,mode);
    inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
    inode->i_ino = 1;

	/* TODO 5: Init i_ino using get_next_ino */
    inode->i_ino = get_next_ino();

	/* TODO 6: Initialize address space operations. */
    inode->i_mapping->a_ops = &myfs_aops;

	if (S_ISDIR(mode)) {
		/* TODO 3: set inode operations for dir inodes. */
        inode->i_op = &simple_dir_inode_operations;
        inode->i_fop = &simple_dir_operations;

		/* TODO 5: use myfs_dir_inode_operations for inode
		 * operations (i_op).
		 */
         inode->i_op = &myfs_dir_inode_operations;

		/* TODO 3: directory inodes start off with i_nlink == 2 (for "." entry).
		 * Directory link count should be incremented (use inc_nlink).
		 */
         inc_nlink(inode);
	}

	/* TODO 6: Set file inode and file operations for regular files
	 * (use the S_ISREG macro).
	 */
     if(S_ISREG(mode)){
        inode->i_op = &myfs_file_inode_operations;
        inode->i_fop = &myfs_file_operations;
     }

	return inode;
}

/* TODO 5: Implement myfs_mknod, myfs_create, myfs_mkdir. */
static int myfs_mknod(struct inode *dir,struct dentry *dentry,umode_t mode,dev_t dev){
    struct inode *inode = myfs_get_inode(dir->i_sb,dir,mode);
    if(inode == NULL)
        return -ENOSPC;

    d_instantiate(dentry,inode);
    dget(dentry);
    dir->i_mtime = dir->i_ctime = current_time(inode);

    return 0;
}

static int myfs_create(struct inode *dir,struct dentry *dentry,umode_t mode,bool excl){
    return myfs_mknod(dir,dentry,mode | S_IFREG, 0);
}

static int myfs_mkdir(struct inode *dir,struct dentry *dentry,umode_t mode){
    int ret;
    ret = myfs_mknod(dir,dentry,mode | S_IFDIR, 0);
    if(ret != 0)
        return ret;

    inc_nlink(dir);
    
    return 0;
}

static int myfs_fill_super(struct super_block *sb, void *data, int silent)
{
	struct inode *root_inode;
	struct dentry *root_dentry;

	/* TODO 2: fill super_block
	 *   - blocksize, blocksize_bits
	 *   - magic
	 *   - super operations
	 *   - maxbytes
	 */
     sb->s_maxbytes = MAX_LFS_FILESIZE;
     sb->s_blocksize = MYFS_BLOCKSIZE;
     sb->s_blocksize_bits = MYFS_BLOCKSIZE_BITS;
     sb->s_magic = MYFS_MAGIC;
     sb->s_op = &myfs_ops;

	/* mode = directory & access rights (755) */
	root_inode = myfs_get_inode(sb, NULL,
			S_IFDIR | S_IRWXU | S_IRGRP |
			S_IXGRP | S_IROTH | S_IXOTH);

	printk(LOG_LEVEL "root inode has %d link(s)n", root_inode->i_nlink);

	if (!root_inode)
		return -ENOMEM;

	root_dentry = d_make_root(root_inode);
	if (!root_dentry)
		goto out_no_root;
	sb->s_root = root_dentry;

	return 0;

out_no_root:
	iput(root_inode);
	return -ENOMEM;
}

static struct dentry *myfs_mount(struct file_system_type *fs_type,
		int flags, const char *dev_name, void *data)
{
	/* TODO 1: call superblock mount function */
    return mount_nodev(fs_type,flags,data,myfs_fill_super);
}

/* TODO 1: define file_system_type structure */
static struct file_system_type myfs_fs_type = {
    .owner      = THIS_MODULE,
    .name       = "myfs",
    .mount      = myfs_mount,
    .kill_sb    = kill_litter_super,
};

static int __init myfs_init(void)
{
	int err;

	/* TODO 1: register */
    err = register_filesystem(&myfs_fs_type);
	if (err) {
		printk(LOG_LEVEL "register_filesystem failedn");
		return err;
	}

	return 0;
}

static void __exit myfs_exit(void)
{
	/* TODO 1: unregister */
    unregister_filesystem(&myfs_fs_type);
}

module_init(myfs_init);
module_exit(myfs_exit);

测试1

root@qemux86:~/skels/filesystems/myfs# ./test-myfs.sh 
+ insmod myfs.ko
+ mkdir -p /mnt/myfs
+ mount -t myfs none /mnt/myfs
root inode has 2 link(s)
+ grep myfs
+ cat /proc/filesystems
nodev   myfs
+ grep myfs
+ cat /proc/mounts
none /mnt/myfs myfs rw,relatime 0 0
+ stat -f /mnt/myfs
  File: "/mnt/myfs"
    ID: 0        Namelen: 255     Type: UNKNOWN
Block size: 4096      
Blocks: Total: 0          Free: 0          Available: 0
Inodes: Total: 0          Free: 0
+ cd /mnt/myfs
+ ls -la
drwxr-xr-x    2 root     root             0 Feb 29 00:55 .
drwxr-xr-x    3 root     root          1024 Feb 28 23:02 ..
+ cd ..
+ umount /mnt/myfs
+ rmmod myfs
root@qemux86:~/skels/filesystems/myfs#

测试2

root@qemux86:~/skels/filesystems/myfs# ./test-myfs-1.sh 
+ insmod myfs.ko
+ mkdir -p /mnt/myfs
+ mount -t myfs none /mnt/myfs
root inode has 2 link(s)
+ ls -laid /mnt/myfs
   6460 drwxr-xr-x    2 root     root             0 Feb 29 00:57 /mnt/myfs
+ cd /mnt/myfs
+ mkdir mydir
+ ls -la
drwxr-xr-x    3 root     root             0 Feb 29 00:57 .
drwxr-xr-x    3 root     root          1024 Feb 28 23:02 ..
drwxr-xr-x    2 root     root             0 Feb 29 00:57 mydir
+ cd mydir
+ mkdir mysubdir
+ ls -lai
   6465 drwxr-xr-x    3 root     root             0 Feb 29 00:57 .
   6460 drwxr-xr-x    3 root     root             0 Feb 29 00:57 ..
   6470 drwxr-xr-x    2 root     root             0 Feb 29 00:57 mysubdir
+ mv mysubdir myrenamedsubdir
+ ls -lai
   6465 drwxr-xr-x    3 root     root             0 Feb 29 00:57 .
   6460 drwxr-xr-x    3 root     root             0 Feb 29 00:57 ..
   6470 drwxr-xr-x    2 root     root             0 Feb 29 00:57 myrenamedsubdir
+ rmdir myrenamedsubdir
+ ls -la
drwxr-xr-x    2 root     root             0 Feb 29 00:57 .
drwxr-xr-x    3 root     root             0 Feb 29 00:57 ..
+ touch myfile
+ ls -lai
   6465 drwxr-xr-x    2 root     root             0 Feb 29 00:57 .
   6460 drwxr-xr-x    3 root     root             0 Feb 29 00:57 ..
   6483 -rw-r--r--    1 root     root             0 Feb 29 00:57 myfile
+ mv myfile myrenamedfile
+ ls -lai
   6465 drwxr-xr-x    2 root     root             0 Feb 29 00:57 .
   6460 drwxr-xr-x    3 root     root             0 Feb 29 00:57 ..
   6483 -rw-r--r--    1 root     root             0 Feb 29 00:57 myrenamedfile
+ rm myrenamedfile
+ cd ..
+ rmdir mydir
+ ls -la
drwxr-xr-x    2 root     root             0 Feb 29 00:57 .
drwxr-xr-x    3 root     root          1024 Feb 28 23:02 ..
+ cd ..
+ umount /mnt/myfs
+ rmmod myfs
root@qemux86:~/skels/filesystems/myfs#

测试3

root@qemux86:~/skels/filesystems/myfs# ./test-myfs-2.sh 
+ insmod myfs.ko
+ mkdir -p /mnt/myfs
+ mount -t myfs none /mnt/myfs
root inode has 2 link(s)
+ ls -laid /mnt/myfs
   6514 drwxr-xr-x    2 root     root             0 Feb 29 01:04 /mnt/myfs
+ cd /mnt/myfs
+ touch myfile
+ ls -lai
   6514 drwxr-xr-x    2 root     root             0 Feb 29 01:04 .
    718 drwxr-xr-x    3 root     root          1024 Feb 28 23:02 ..
   6519 -rw-r--r--    1 root     root             0 Feb 29 01:04 myfile
+ mv myfile myrenamedfile
+ ls -lai
   6514 drwxr-xr-x    2 root     root             0 Feb 29 01:04 .
    718 drwxr-xr-x    3 root     root          1024 Feb 28 23:02 ..
   6519 -rw-r--r--    1 root     root             0 Feb 29 01:04 myrenamedfile
+ ln myrenamedfile mylink
+ ls -lai
   6514 drwxr-xr-x    2 root     root             0 Feb 29 01:04 .
    718 drwxr-xr-x    3 root     root          1024 Feb 28 23:02 ..
   6519 -rw-r--r--    2 root     root             0 Feb 29 01:04 mylink
   6519 -rw-r--r--    2 root     root             0 Feb 29 01:04 myrenamedfile
+ echo message
+ cat myrenamedfile
message
+ rm mylink
+ ls -la
drwxr-xr-x    2 root     root             0 Feb 29 01:04 .
drwxr-xr-x    3 root     root          1024 Feb 28 23:02 ..
-rw-r--r--    1 root     root             8 Feb 29 01:04 myrenamedfile
+ rm -f myrenamedfile
+ ls -la
drwxr-xr-x    2 root     root             0 Feb 29 01:04 .
drwxr-xr-x    3 root     root          1024 Feb 28 23:02 ..
+ cd ..
+ umount /mnt/myfs
+ rmmod myfs
root@qemux86:~/skels/filesystems/myfs#

Part2

Inode

The inode is an essential component of a UNIX file system and, at the same time, an important component of VFS. An inode is a metadata (it has information about information). An inode uniquely identifies a file on disk and holds information about it (uid, gid, access rights, access times, pointers to data blocks, etc.). An important aspect is that an inode does not have information about the file name (it is retained by the associated struct dentry structure).

The inode refers to a file on the disk. To refer an open file (associated with a file descriptor within a process), the struct file structure is used. An inode can have any number of (zero or more) file structures associated (multiple processes can open the same file, or a process can open the same file several times).

Inode exists both as a VFS entity (in memory) and as a disk entity (for UNIX, HFS, NTFS, etc.). The inode in VFS is represented by the structure struct inode. Like the other structures in VFS, struct inode is a generic structure that covers the options for all supported file types, even those that do not have an associated disk entity (such as FAT).

inode结构

The inode structure is the same for all file systems. In general, file systems also have private information. These are referenced through the i_private field of the structure. Conventionally, the structure that keeps that particular information is called _inode_info, where fsname represents the file system name. For example, minix and ext4 filesystems store particular information in structures struct minix_inode_info, or struct ext4_inode_info.

Some of the important fields of struct inode are:

i_sb : The superblock structure of the file system the inode belongs to.
i_rdev: the device on which this file system is mounted
i_ino : the number of the inode (uniquely identifies the inode within the file system)
i_blkbits: number of bits used for the block size == log2(block size)
i_mode, i_uid, i_gid: access rights, uid, gid
i_size: file/directory/etc. size in bytes
i_mtime, i_atime, i_ctime: change, access, and creation time
i_nlink: the number of names entries (dentries) that use this inode; for file systems without links (either hard or symbolic) this is always set to 1
i_blocks: the number of blocks used by the file (all blocks, not just data); this is only used by the quota subsystem
i_op, i_fop: pointers to operations structures: struct inode_operations and struct file_operations; i_mapping->a_ops contains a pointer to struct address_space_operations.
i_count: the inode counter indicating how many kernel components use it.

Some functions that can be used to work with inodes:

new_inode(): creates a new inode, sets the i_nlink field to 1 and initializes i_blkbits, i_sb and i_dev;
insert_inode_hash(): adds the inode to the hash table of inodes; an interesting effect of this call is that the inode will be written to the disk if it is marked as dirty;

Warning

An inode created with new_inode() is not in the hash table, and unless you have serious reasons not to, you must enter it in the hash table;

mark_inode_dirty(): marks the inode as dirty; at a later moment, it will be written on the disc;
iget_locked(): loads the inode with the given number from the disk, if it is not already loaded;
unlock_new_inode(): used in conjunction with iget_locked(), releases the lock on the inode;
iput(): tells the kernel that the work on the inode is finished; if no one else uses it, it will be destroyed (after being written on the disk if it is maked as dirty);
make_bad_inode(): tells the kernel that the inode can not be used; It is generally used from the function that reads the inode when the inode could not be read from the disk, being invalid.

inode操作

获取inode

One of the main inode operations is obtaining an inode (the struct inode in VFS). Until version 2.6.24 of the Linux kernel, the developer defined a read_inode function. Starting with version 2.6.25, the developer must define a _iget where is the name of the file system. This function is responsible with finding the VFS inode if it exists or creating a new one and filling it with the information from the disk.

Generally, this function will call iget_locked() to get the inode structure from VFS. If the inode is newly created then it will need to read the inode from the disk (using sb_bread()) and fill in the useful information.

An example of such a function is minix_iget():

static struct inode *V1_minix_iget(struct inode *inode)
{
      struct buffer_head * bh;
      struct minix_inode * raw_inode;
      struct minix_inode_info *minix_inode = minix_i(inode);
      int i;

      raw_inode = minix_V1_raw_inode(inode->i_sb, inode->i_ino, &bh);
      if (!raw_inode) {
              iget_failed(inode);
              return ERR_PTR(-EIO);
      ...
}

struct inode *minix_iget(struct super_block *sb, unsigned long ino)
{
      struct inode *inode;

      inode = iget_locked(sb, ino);
      if (!inode)
              return ERR_PTR(-ENOMEM);
      if (!(inode->i_state & I_NEW))
              return inode;

      if (INODE_VERSION(inode) == MINIX_V1)
              return V1_minix_iget(inode);
    ...
}

The minix_iget function gets the VFS inode using iget_locked(). If the inode is already existing (not new == the I_NEW flag is not set) the function returns. Otherwise, the function calls the V1_minix_iget() function that will read the inode from the disk using minix_V1_raw_inode() and then complete the VFS inode with the read information.

Superoperations

Many of the superoperations (components of the struct super_operations structure used by the superblock) are used when working with inodes. These operations are described next:

alloc_inode: allocates an inode. Usually, this funcion allocates a struct _inode_info structure and performs basic VFS inode initialization (using inode_init_once()); minix uses for allocation the kmem_cache_alloc() function that interacts with the SLAB subsystem. For each allocation, the cache construction is called, which in the case of minix is the init_once() function. Alternatively, kmalloc() can be used, in which case the inode_init_once() function should be called. The alloc_inode() function will be called by the new_inode() and iget_locked() functions.
write_inode : saves/updates the inode received as a parameter on disk; to update the inode, though inefficient, for beginners it is recommended to use the following sequence of operations:
load the inode from the disk using the sb_bread() function;
modify the buffer according to the saved inode;
mark the buffer as dirty using mark_buffer_dirty(); the kernel will then handle its writing on the disk;
an example is the minix_write_inode() function in the minix file system
evict_inode: removes any information about the inode with the number received in the i_ino field from the disk and memory (both the inode on the disk and the associated data blocks). This involves performing the following operations:
delete the inode from the disk;
updates disk bitmaps (if any);
delete the inode from the page cache by calling truncate_inode_pages();
delete the inode from memory by calling clear_inode() ;
an example is the minix_evict_inode() function from the minix file system.
destroy_inode releases the memory occupied by inode

inode_operations

The inode operations are described by the struct inode_operations structure.

Inodes are of several types: file, directory, special file (pipe, fifo), block device, character device, link etc. For this reason, the operations that an inode needs to implement are different for each type of inode. Below are detailed operations for a file type inode and a directory inode.

The operations of an inode are initialized and accessed using the i_op field of the structure struct inode.

文件结构

The file structure corresponds to a file open by a process and exists only in memory, being associated with an inode. It is the closest VFS entity to user-space; the structure fields contain familiar information of a user-space file (access mode, file position, etc.) and the operations with it are performed by known system calls (read, write , etc.).

The file operations are described by the struct file_operations structure.

The file operations for a file system are initialized using the i_fop field of the struct inode structure. When opening a file, the VFS initializes the f_op field of the struct file structure with address of inode->i_fop, such that subsequent system calls use the value stored in the file->f_op.

Regular files inodes

To work with the inode, the i_op and i_fop fields of the inode structure must be filled in. The type of the inode determines the operations that it needs to implement.

Regular files inode operations

In the minix file system, the minix_file_inode_operations structure is defined for the operations on an inode and for the file operations the minix_file_operations structure is defined:

const struct file_operations minix_file_operations = {
         .llseek         = generic_file_llseek,
         .read_iter      = generic_file_read_iter,
         //...
         .write_iter     = generic_file_write_iter,
         //...
         .mmap           = generic_file_mmap,
         //...
};

const struct inode_operations minix_file_inode_operations = {
        .setattr        = minix_setattr,
        .getattr        = minix_getattr,
};

        //...
        if (S_ISREG(inode->i_mode)) {
                inode->i_op = &minix_file_inode_operations;
                inode->i_fop = &minix_file_operations;
        }
        //...

The functions generic_file_llseek() , generic_file_mmap() , generic_file_read_iter() and generic_file_write_iter() are implemented in the kernel.

For simple file systems, only the truncation operation (truncate system call) must be implemented. Although initially there was a dedicated operation, starting with 3.14 the operation was embedded in setattr: if the paste size is different from the current size of the inode, then a truncate operation must be performed. An example of implementing this verification is in the minix_setattr() function:

static int minix_setattr(struct dentry *dentry, struct iattr *attr)
{
        struct inode *inode = d_inode(dentry);
        int error;

        error = setattr_prepare(dentry, attr);
        if (error)
                return error;

        if ((attr->ia_valid & ATTR_SIZE) &&
            attr->ia_size != i_size_read(inode)) {
                error = inode_newsize_ok(inode, attr->ia_size);
                if (error)
                        return error;

                truncate_setsize(inode, attr->ia_size);
                minix_truncate(inode);
        }

        setattr_copy(inode, attr);
        mark_inode_dirty(inode);
        return 0;
}

The truncate operation involves:

freeing blocks of data on the disk that are now extra (if the new dimension is smaller than the old one) or allocating new blocks (for cases where the new dimension is larger)
updating disk bit maps (if used);
updating the inode;
filling with zero the space that was left unused from the last block using the block_truncate_page() function.

An example of the implementation of the cropping operation is the minix_truncate() function in the minix file system.

地址空间操作

There is a close link between the address space of a process and files: the execution of the programs is done almost exclusively by mapping the file into the process address space. Because this approach works very well and is quite general, it can also be used for regular system calls such as read and write.

The structure that describes the address space is struct address_space, and the operations with it are described by the structure struct address_space_operations. To initialize the address space operations, fill inode->i_mapping->a_ops of the file type inode.

An example is the minix_aops structure in the minix file system:

static const struct address_space_operations minix_aops = {
       .readpage = minix_readpage,
       .writepage = minix_writepage,
       .write_begin = minix_write_begin,
       .write_end = generic_write_end,
       .bmap = minix_bmap
};

//...
if (S_ISREG(inode->i_mode)) {
      inode->i_mapping->a_ops = &minix_aops;
}
//...

The generic_write_end() function is already implemented. Most of the specific functions are very easy to implement, as follows:

static int minix_writepage(struct page *page, struct writeback_control *wbc)
{
         return block_write_full_page(page, minix_get_block, wbc);
}

static int minix_readpage(struct file *file, struct page *page)
{
         return block_read_full_page(page, minix_get_block);
}

static void minix_write_failed(struct address_space *mapping, loff_t to)
{
        struct inode *inode = mapping->host;

        if (to > inode->i_size) {
                truncate_pagecache(inode, inode->i_size);
                minix_truncate(inode);
        }
}

static int minix_write_begin(struct file *file, struct address_space *mapping,
                        loff_t pos, unsigned len, unsigned flags,
                        struct page **pagep, void **fsdata)
{
        int ret;

        ret = block_write_begin(mapping, pos, len, flags, pagep,
                                minix_get_block);
        if (unlikely(ret))
                minix_write_failed(mapping, pos + len);

        return ret;
}

static sector_t minix_bmap(struct address_space *mapping, sector_t block)
{
         return generic_block_bmap(mapping, block, minix_get_block);
}

All that needs to be done is to implement minix_get_block, which has to translate a block of a file into a block on the device. If the flag create received as a parameter is set, a new block must be allocated. In case a new block is created, the bit map must be updated accordingly. To notify the kernel not to read the block from the disk, bh must be marked with set_buffer_new(). The buffer must be associated with the block through map_bh().

Dentry structure

Directories operations use the struct dentry structure. Its main task is to make links between inodes and filenames. The important fields of this structure are presented below:

struct dentry {
        //...
        struct inode             *d_inode;     /* associated inode */
        //...
        struct dentry            *d_parent;    /* dentry object of parent */
        struct qstr              d_name;       /* dentry name */
        //...

        struct dentry_operations *d_op;        /* dentry operations table */
        struct super_block       *d_sb;        /* superblock of file */
        void                     *d_fsdata;    /* filesystem-specific data */
        //...
};

Fields meaning:

d_inode: the inode referenced by this dentry;
d_parent: the dentry associated with the parent directory;
d_name: a struct qstr structure that contains the fields name and len (the name and the length of the name).
d_op: operations with dentries, represented by the struct dentry_operations structure. The kernel implements default operations so there is no need to (re)implement them. Some file systems can do optimizations based on the - specific structure of the dentries.
d_fsdata: field reserved for the file system that implements dentry operations;

Dentry operations

The most commonly operations applied to dentries are:

d_make_root: allocates the root dentry. It is generally used in the function that is called to read the superblock (fill_super), which must initialize the root directory. So the root inode is obtained from the superblock and is used as an argument to this function, to fill the s_root field from the struct super_block structure.
d_add: associates a dentry with an inode; the dentry received as a parameter in the calls discussed above signifies the entry (name, length) that needs to be created. This function will be used when creating/loading a new inode that does not have a dentry associated with it and has not yet been introduced to the hash table of inodes (at lookup);
d_instantiate: The lighter version of the previous call, in which the dentry was previously added in the hash table.

Directory inodes operations

The operations for directory type inodes have a higher complexity level than the ones for files. The developer must define operations for inodes and operations for files. In minix, these operations are defined in minix_dir_inode_operations and minix_dir_operations:

struct inode_operations minix_dir_inode_operations = {
      .create = minix_create,
      .lookup = minix_lookup,
      .link = minix_link,
      .unlink = minix_unlink,
      .symlink = minix_symlink,
      .mkdir = minix_mkdir,
      .rmdir = minix_rmdir,
      .mknod = minix_mknod,
      //...
};

struct file_operations minix_dir_operations = {
      .llseek = generic_file_llseek,
      .read = generic_read_dir,
      .iterate = minix_readdir,
      //...
};

        //...
      if (S_ISDIR(inode->i_mode)) {
              inode->i_op = &minix_dir_inode_operations;
              inode->i_fop = &minix_dir_operations;
              inode->i_mapping->a_ops = &minix_aops;
      }
       //...

The only function already implemented is generic_read_dir().

The functions that implement the operations on directory inodes are the ones described below.

Creating an inode

The inode creation function is indicated by the field create in the inode_operations structure. In the minix case, the function is minix_create(). This function is called by the open and creat system calls. Such a function performs the following operations:

1.Introduces a new entry into the physical structure on the disk; the update of the bit maps on the disk must not be forgotten.
2.Configures access rights to those received as a parameter.
3.Marks the inode as dirty with the mark_inode_dirty() function.
4.Instantiates the directory entry (dentry) with the d_instantiate function.

Creating a directory

The directory creation function is indicated by the mkdir field in the inode_operations structure. In the minix case, the function is minix_mkdir(). This function is called by the mkdir system call. Such a function performs the following operations:

1.Calls minix_create().
2.Allocates a data block for the directory.
3.Creates the “.” and “…” entries.

Creating a link

The link creation function (hard link) is indicated by the symlink field in the inode_operations structure. In the minix case, the function is minix_link(). This function is called by the link system call. Such a function performs the following operations:

Binds the new dentry to the inode.
Increments the i_nlink field of the inode.
Marks the inode as dirty using the mark_inode_dirty() function.

Creating a symbolic link

The symbolic link creation function is indicated by the symlink field in the inode_operations structure. In the minix case, the function is minix_symlink(). The operations to be performed are similar to minix_link with the differences being given by the fact that a symbolic link is created.

Deleting a link

The link delete function (hard link) is indicated by the unlink field in the inode_operations structure. In the minix case, the function is minix_unlink(). This function is called by the unlink system call. Such a function performs the following operations:

1.Deletes the directory entry given as a parameter from the physical disk structure.
2.Decrements the i_nlink counter of the inode to which the entry points (otherwise the inode will never be deleted).

删除目录

The directory delete function is indicated by the rmdir field in the inode_operations structure. In the minix case, the function is minix_rmdir(). This function is called by the rmdir system call. Such a function performs the following operations:

1.Performs the operations done by minix_unlink.
2.Ensures that the directory is empty; otherwise, returns ENOTEMPTY.
3.Also deletes the data blocks.

Searching for an inode in a directory

The function that searches for an entry in a directory and extracts the inode is indicated by the lookup field in the inode_operations structure. In the minix case, the function is minix_lookup. This function is called indirectly when information about the inode associated with an entry in a directory is needed. Such a function performs the following operations:

1.Searces in the directory indicated by dir the entry having the name dentry->d_name.name.
2.If the entry is found, it will return NULL and associate the inode with the name using the d_add() function.
3.Otherwise, returns ERR_PTR.

Iterating through entries in a directory

The function which iterates through the entries in a directory (lists the directory contents) is indicated by the field iterate in the struct file_operations structure. In the minix case, the function is minix_readdir. This function is called by the readdir system call.

The function returns either all entries in the directory or just a part when the buffer allocated for it is not available. A call of this function can return:

a number equal to the existing number of entries if there is enough space in the corresponding user space buffer;
a number smaller than the actual number of entries, as much as there was space in the corresponding user space buffer;
0, where there are no more entries to read.

The function will be called consecutively until all available entries are read. The function is called at least twice.

It is only called twice if:
the first call reads all entries and returns their number;
the second call returns 0, having no other entries to read.
It is called more than twice if the first call does not return the total number of entries.
The function performs the following operations:

1.Iterates over the entries (the dentries) from the current directory.
2.For each dentry found, increments ctx->pos.
3.For each valid dentry (an inode other than 0, for example), calls the dir_emit() function.
4.If the dir_emit() function returns a value other than zero, it means that the buffer in the user space is full and the function returns.

The arguments of the dir_emit function are:

ctx is the directory iteration context, passed as an argument to the iterate function;
name is the name of the entry (a string of characters);
name_len is the length of the entry name;
ino is the inode number associated with the entry;
type identifies the entry type: DT_REG (file), DT_DIR (directory), DT_UNKNOWN etc. DT_UNKNOWN can be used when the entry type is unknown.

Bitmap operations

When working with the file systems, management information (what block is free or busy, what inode is free or busy) is stored using bitmaps. For this we often need to use bit operations. Such operations are:

searching the first 0 bit: representing a free block or inode
marking a bit as 1: marking a busy block or inode

The bitmap operations are found in headers from include/asm-generic/bitops, especially in find.h and atomic.h. Usual functions, with names indicating their role, are:

find_first_zero_bit()
find_first_bit()
set_bit()
clear_bit()
test_and_set_bit()
test_and_clear_bit()

These functions usually receive the address of the bitmap, possibly its size (in bytes) and, if necessary, the index of the bit that needs to be activated (set) or deactivated (clear).

Some usage examples are listed below:

unsigned int map;
unsigned char array_map[NUM_BYTES];
size_t idx;
int changed;

/* Find first zero bit in 32 bit integer. */
idx = find_first_zero_bit(&map, 32);
printk (KERN_ALERT "The %zu-th bit is the first zero bit.n", idx);

/* Find first one bit in NUM_BYTES bytes array. */
idx = find_first_bit(array_map, NUM_BYTES * 8);
printk (KERN_ALERT "The %zu-th bit is the first one bit.n", idx);

/*
 * Clear the idx-th bit in integer.
 * It is assumed idx is less the number of bits in integer.
 */
clear_bit(idx, &map);

/*
 * Test and set the idx-th bit in array.
 * It is assumed idx is less the number of bits in array.
 */
changed = __test_and_set_bit(idx, &sbi->imap);
if (changed)
      printk(KERN_ALERT "%zu-th bit changedn", idx);

格式化磁盘/dev/vdb

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <sys/stat.h>
#include <linux/types.h>

#include "../kernel/minfs.h"

/*
 * mk_minfs file
 */

int main(int argc, char **argv)
{
	FILE *file;
	char buffer[MINFS_BLOCK_SIZE];
	struct minfs_super_block msb;
	struct minfs_inode root_inode;
	struct minfs_inode file_inode;
	struct minfs_dir_entry file_dentry;
	int i;

	if (argc != 2) {
		fprintf(stderr, "Usage: %s block_device_namen", argv[0]);
		exit(EXIT_FAILURE);
	}

	file = fopen(argv[1], "w+");
	if (file == NULL) {
		perror("fopen");
		exit(EXIT_FAILURE);
	}

	memset(&msb, 0, sizeof(struct minfs_super_block));

	msb.magic = MINFS_MAGIC;
	msb.version = 1;
	msb.imap = 0x03;

	/* zero disk  */
	memset(buffer, 0,  MINFS_BLOCK_SIZE);
	for (i = 0; i < 128; i++)
		fwrite(buffer, 1, MINFS_BLOCK_SIZE, file);

	fseek(file, 0, SEEK_SET);

	/* initialize super block */
	fwrite(&msb, sizeof(msb), 1, file);

	/* initialize root inode */
	memset(&root_inode, 0, sizeof(root_inode));
	root_inode.uid = 0;
	root_inode.gid = 0;
	root_inode.mode = S_IFDIR | 0755;
	root_inode.size = 0;
	root_inode.data_block = MINFS_FIRST_DATA_BLOCK;

	fseek(file, MINFS_INODE_BLOCK * MINFS_BLOCK_SIZE, SEEK_SET);
	fwrite(&root_inode, sizeof(root_inode), 1, file);

	/* initialize new inode */
	memset(&file_inode, 0, sizeof(file_inode));
	file_inode.uid = 0;
	file_inode.gid = 0;
	file_inode.mode = S_IFREG | 0644;
	file_inode.size = 0;
	file_inode.data_block = MINFS_FIRST_DATA_BLOCK + 1;
	fwrite(&file_inode, sizeof(file_inode), 1, file);

	/* add dentry information */
	memset(&file_dentry, 0, sizeof(file_dentry));
	file_dentry.ino = 1;
	memcpy(file_dentry.name, "a.txt", 5);
	fseek(file, MINFS_FIRST_DATA_BLOCK * MINFS_BLOCK_SIZE, SEEK_SET);
	fwrite(&file_dentry, sizeof(file_dentry), 1, file);

	fclose(file);

	return 0;
}

minfs代码

/*
 * SO2 Lab - Filesystem drivers
 * Exercise #2 (dev filesystem)
 */

#include <linux/buffer_head.h>
#include <linux/cred.h>
#include <linux/fs.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pagemap.h>
#include <linux/sched.h>
#include <linux/slab.h>

#include "minfs.h"

MODULE_DESCRIPTION("Simple filesystem");
MODULE_AUTHOR("SO2");
MODULE_LICENSE("GPL");

#define LOG_LEVEL	KERN_ALERT


struct minfs_sb_info {
	__u8 version;
	unsigned long imap;
	struct buffer_head *sbh;
};

struct minfs_inode_info {
	__u16 data_block;
	struct inode vfs_inode;
};

/* declarations of functions that are part of operation structures */

static int minfs_readdir(struct file *filp, struct dir_context *ctx);
static struct dentry *minfs_lookup(struct inode *dir,
		struct dentry *dentry, unsigned int flags);
static int minfs_create(struct inode *dir, struct dentry *dentry,
		umode_t mode, bool excl);

/* dir and inode operation structures */

static const struct file_operations minfs_dir_operations = {
	.read		= generic_read_dir,
	.iterate	= minfs_readdir,
};

static const struct inode_operations minfs_dir_inode_operations = {
	.lookup		= minfs_lookup,
	/* TODO 7: Use minfs_create as the create function. */
    .create     = minfs_create,
};

static const struct address_space_operations minfs_aops = {
	.readpage       = simple_readpage,
	.write_begin    = simple_write_begin,
	.write_end      = simple_write_end,
};

static const struct file_operations minfs_file_operations = {
	.read_iter	= generic_file_read_iter,
	.write_iter	= generic_file_write_iter,
	.mmap		= generic_file_mmap,
	.llseek		= generic_file_llseek,
};

static const struct inode_operations minfs_file_inode_operations = {
	.getattr	= simple_getattr,
};

static struct inode *minfs_iget(struct super_block *s, unsigned long ino)
{
	struct minfs_inode *mi;
	struct buffer_head *bh;
	struct inode *inode;
	struct minfs_inode_info *mii;

	/* Allocate VFS inode. */
	inode = iget_locked(s, ino);
	if (inode == NULL) {
		printk(LOG_LEVEL "error aquiring inoden");
		return ERR_PTR(-ENOMEM);
	}

	/* Return inode from cache */
	if (!(inode->i_state & I_NEW))
		return inode;

	/* TODO 4: Read block with inodes. It's the second block on
	 * the device, i.e. the block with the index 1. This is the index
	 * to be passed to sb_bread().
	 */
    if(!(bh = sb_bread(s,MINFS_INODE_BLOCK)))
        goto out_bad_sb;
	/* TODO 4: Get inode with index ino from the block. */
    mi = ((struct minfs_inode *) bh->b_data) + ino;

	/* TODO 4: fill VFS inode */
    inode->i_mode = mi->mode;
    i_uid_write(inode,mi->uid);
    i_gid_write(inode,mi->gid);
    inode->i_size = mi->size;
    inode->i_blocks = 0;
    inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);

	/* TODO 7: Fill address space operations (inode->i_mapping->a_ops) */
    inode->i_mapping->a_ops = &minfs_aops;

	if (S_ISDIR(inode->i_mode)) {
		/* TODO 4: Fill dir inode operations. */
        inode->i_op = &simple_dir_inode_operations;
        inode->i_fop = &simple_dir_operations;

		/* TODO 5: Use minfs_dir_inode_operations for i_op
		 * and minfs_dir_operations for i_fop. */
         inode->i_op = &minfs_dir_inode_operations;
         inode->i_fop = &minfs_dir_operations;

		/* TODO 4: Directory inodes start off with i_nlink == 2.
		 * (use inc_link) */
         inc_nlink(inode);
	}

	/* TODO 7: Fill inode and file operations for regular files
	 * (i_op and i_fop). Use the S_ISREG macro.
	 */
     if(S_ISREG(inode->i_mode)){
        inode->i_op = &minfs_file_inode_operations;
        inode->i_fop = &minfs_file_operations;
     }

	/* fill data for mii */
	mii = container_of(inode, struct minfs_inode_info, vfs_inode);

	/* TODO 4: uncomment after the minfs_inode is initialized */
    mii->data_block = mi->data_block;
	//mii->data_block = mi->data_block;

	/* Free resources. */
	/* TODO 4: uncomment after the buffer_head is initialized */
    brelse(bh);
	//brelse(bh);
	unlock_new_inode(inode);

	return inode;

out_bad_sb:
	iget_failed(inode);
	return NULL;
}

static int minfs_readdir(struct file *filp, struct dir_context *ctx)
{
	struct buffer_head *bh;
	struct minfs_dir_entry *de;
	struct minfs_inode_info *mii;
	struct inode *inode;
	struct super_block *sb;
	int over;
	int err = 0;

	/* TODO 5: Get inode of directory and container inode. */
    inode = file_inode(filp);
    mii = container_of(inode, struct minfs_inode_info, vfs_inode);

	/* TODO 5: Get superblock from inode (i_sb). */
    sb = inode->i_sb;

	/* TODO 5: Read data block for directory inode. */
    bh = sb_bread(sb,mii->data_block);
    if(bh == NULL){
        printk(LOG_LEVEL "could not read blockn");
        err = -ENOMEM;
        goto out_bad_sb;
    }

	for (; ctx->pos < MINFS_NUM_ENTRIES; ctx->pos++) {
		/* TODO 5: Data block contains an array of
		 * "struct minfs_dir_entry". Use `de' for storing.
		 */
         de = (struct minfs_dir_entry *) bh->b_data + ctx->pos;

		/* TODO 5: Step over empty entries (de->ino == 0). */
        if(de->ino == 0){
            continue;
        }

		/*
		 * Use `over` to store return value of dir_emit and exit
		 * if required.
		 */
		over = dir_emit(ctx, de->name, MINFS_NAME_LEN, de->ino,
				DT_UNKNOWN);
		if (over) {
			printk(KERN_DEBUG "Read %s from folder %s, ctx->pos: %lldn",
				de->name,
				filp->f_path.dentry->d_name.name,
				ctx->pos);
			ctx->pos++;
			goto done;
		}
	}

done:
	brelse(bh);
out_bad_sb:
	return err;
}

/*
 * Find dentry in parent folder. Return parent folder's data buffer_head.
 */

static struct minfs_dir_entry *minfs_find_entry(struct dentry *dentry,
		struct buffer_head **bhp)
{
	struct buffer_head *bh;
	struct inode *dir = dentry->d_parent->d_inode;
	struct minfs_inode_info *mii = container_of(dir,
			struct minfs_inode_info, vfs_inode);
	struct super_block *sb = dir->i_sb;
	const char *name = dentry->d_name.name;
	struct minfs_dir_entry *final_de = NULL;
	struct minfs_dir_entry *de;
	int i;

	/* TODO 6: Read parent folder data block (contains dentries).
	 * Fill bhp with return value.
	 */
     bh = sb_bread(sb,mii->data_block);
     if(bh == NULL){
        printk(LOG_LEVEL "could not read blockn");
        return NULL;
     }
     *bhp = bh;

	for (i = 0; i < MINFS_NUM_ENTRIES; i++) {
		/* TODO 6: Traverse all entries, find entry by name
		 * Use `de' to traverse. Use `final_de' to store dentry
		 * found, if existing.
		 */
         de = ((struct minfs_dir_entry *) bh->b_data) + i;
         if(de->ino != 0){
            if(strcmp(name, de->name) == 0){
                printk(KERN_DEBUG "Found entry %s on position: %zdn",name,i);
                final_de = de;
                break;
            }
         }
	}

	/* bh needs to be released by caller. */
	return final_de;
}

static struct dentry *minfs_lookup(struct inode *dir,
		struct dentry *dentry, unsigned int flags)
{
	/* TODO 6: Comment line. */
    //return simple_lookup(dir, dentry, flags);

	struct super_block *sb = dir->i_sb;
	struct minfs_dir_entry *de;
	struct buffer_head *bh = NULL;
	struct inode *inode = NULL;

	dentry->d_op = sb->s_root->d_op;

	de = minfs_find_entry(dentry, &bh);
	if (de != NULL) {
		printk(KERN_DEBUG "getting entry: name: %s, ino: %dn",
			de->name, de->ino);
		inode = minfs_iget(sb, de->ino);
		if (IS_ERR(inode))
			return ERR_CAST(inode);
	}

	d_add(dentry, inode);
	brelse(bh);

	printk(KERN_DEBUG "looked up dentry %sn", dentry->d_name.name);

	return NULL;
}

static struct inode *minfs_alloc_inode(struct super_block *s)
{
	struct minfs_inode_info *mii;

	/* TODO 3: Allocate minfs_inode_info. */
    mii = kzalloc(sizeof(struct minfs_inode_info), GFP_KERNEL);
    if(mii == NULL)
        return NULL;

	/* TODO 3: init VFS inode in minfs_inode_info */
    inode_init_once(&mii->vfs_inode);

	return &mii->vfs_inode;
}

static void minfs_destroy_inode(struct inode *inode)
{
	/* TODO 3: free minfs_inode_info */
    kfree(container_of(inode, struct minfs_inode_info, vfs_inode));
}

/*
 * Create a new VFS inode. Do basic initialization and fill imap.
 */

static struct inode *minfs_new_inode(struct inode *dir)
{
	struct super_block *sb = dir->i_sb;
	struct minfs_sb_info *sbi = sb->s_fs_info;
	struct inode *inode;
	int idx;

	/* TODO 7: Find first available inode. */
    idx = find_first_zero_bit(&sbi->imap, MINFS_NUM_INODES);
    if(idx < 0){
        printk(LOG_LEVEL "no space left in imapn");
        return NULL;
    }

	/* TODO 7: Mark the inode as used in the bitmap and mark
	 * the superblock buffer head as dirty.
	 */
     __test_and_set_bit(idx, &sbi->imap);
     mark_buffer_dirty(sbi->sbh);

	/* TODO 7: Call new_inode(), fill inode fields
	 * and insert inode into inode hash table.
	 */
     inode = new_inode(sb);
     inode->i_uid = current_fsuid();
     inode->i_gid = current_fsgid();
     inode->i_ino = idx;
     inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
     inode->i_blocks = 0;

     insert_inode_hash(inode);

	/* Actual writing to the disk will be done in minfs_write_inode,
	 * which will be called at a later time.
	 */

	return inode;
}

/*
 * Add dentry link on parent inode disk structure.
 */

static int minfs_add_link(struct dentry *dentry, struct inode *inode)
{
	struct buffer_head *bh;
	struct inode *dir;
	struct super_block *sb;
	struct minfs_inode_info *mii;
	struct minfs_dir_entry *de;
	int i;
	int err = 0;

	/* TODO 7: Get: directory inode (in inode); containing inode (in mii); superblock (in sb). */
    dir = dentry->d_parent->d_inode;
    mii = container_of(dir, struct minfs_inode_info, vfs_inode);
    sb = dir->i_sb;

	/* TODO 7: Read dir data block (use sb_bread). */
    bh = sb_bread(sb, mii->data_block);

	/* TODO 7: Find first free dentry (de->ino == 0). */
    for(i = 0; i < MINFS_NUM_ENTRIES; i++){
        de = (struct minfs_dir_entry *) bh->b_data + i;
        if(de->ino == 0)
            break;
    }

    if(i == MINFS_NUM_ENTRIES){
        err = -ENOSPC;
        goto out;
    }

	/* TODO 7: Place new entry in the available slot. Mark buffer_head
	 * as dirty. */
    de->ino = inode->i_ino;
    memcpy(de->name, dentry->d_name.name, MINFS_NAME_LEN);
    dir->i_mtime = dir->i_ctime = current_time(inode);
    
    mark_buffer_dirty(bh);

out:
	brelse(bh);

	return err;
}

/*
 * Create a VFS file inode. Use minfs_file_... operations.
 */

static int minfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
		bool excl)
{
	struct inode *inode;
	struct minfs_inode_info *mii;
	int err;

	inode = minfs_new_inode(dir);
	if (inode == NULL) {
		printk(LOG_LEVEL "error allocating new inoden");
		err = -ENOMEM;
		goto err_new_inode;
	}

	inode->i_mode = mode;
	inode->i_op = &minfs_file_inode_operations;
	inode->i_fop = &minfs_file_operations;
	mii = container_of(inode, struct minfs_inode_info, vfs_inode);
	mii->data_block = MINFS_FIRST_DATA_BLOCK + inode->i_ino;

	err = minfs_add_link(dentry, inode);
	if (err != 0)
		goto err_add_link;

	d_instantiate(dentry, inode);
	mark_inode_dirty(inode);

	printk(KERN_DEBUG "new file inode created (ino = %lu)n",
		inode->i_ino);

	return 0;

err_add_link:
	inode_dec_link_count(inode);
	iput(inode);
err_new_inode:
	return err;
}

/*
 * Write VFS inode contents to disk inode.
 */

static int minfs_write_inode(struct inode *inode,
		struct writeback_control *wbc)
{
	struct super_block *sb = inode->i_sb;
	struct minfs_inode *mi;
	struct minfs_inode_info *mii = container_of(inode,
			struct minfs_inode_info, vfs_inode);
	struct buffer_head *bh;
	int err = 0;

	bh = sb_bread(sb, MINFS_INODE_BLOCK);
	if (bh == NULL) {
		printk(LOG_LEVEL "could not read blockn");
		err = -ENOMEM;
		goto out;
	}

	mi = (struct minfs_inode *) bh->b_data + inode->i_ino;

	/* fill disk inode */
	mi->mode = inode->i_mode;
	i_uid_write(inode, mi->uid);
	i_gid_write(inode, mi->gid);
	mi->size = inode->i_size;
	mi->data_block = mii->data_block;

	printk(KERN_DEBUG "mode is %05o; data_block is %dn", mi->mode,
		mii->data_block);

	mark_buffer_dirty(bh);
	brelse(bh);

	printk(KERN_DEBUG "wrote inode %lun", inode->i_ino);

out:
	return err;
}

static void minfs_put_super(struct super_block *sb)
{
	struct minfs_sb_info *sbi = sb->s_fs_info;

	/* Free superblock buffer head. */
	mark_buffer_dirty(sbi->sbh);
	brelse(sbi->sbh);

	printk(KERN_DEBUG "released superblock resourcesn");
}

static const struct super_operations minfs_ops = {
	.statfs		= simple_statfs,
	.put_super	= minfs_put_super,
	/* TODO 4: add alloc and destroy inode functions */
    .alloc_inode = minfs_alloc_inode,
    .destroy_inode = minfs_destroy_inode,
	/* TODO 7:	= set write_inode function. */
    .write_inode = minfs_write_inode,
};

static int minfs_fill_super(struct super_block *s, void *data, int silent)
{
	struct minfs_sb_info *sbi;
	struct minfs_super_block *ms;
	struct inode *root_inode;
	struct dentry *root_dentry;
	struct buffer_head *bh;
	int ret = -EINVAL;

	sbi = kzalloc(sizeof(struct minfs_sb_info), GFP_KERNEL);
	if (!sbi)
		return -ENOMEM;
	s->s_fs_info = sbi;

	/* Set block size for superblock. */
	if (!sb_set_blocksize(s, MINFS_BLOCK_SIZE))
		goto out_bad_blocksize;

	/* TODO 2: Read block with superblock. It's the first block on
	 * the device, i.e. the block with the index 0. This is the index
	 * to be passed to sb_bread().
	 */
     bh = sb_bread(s,MINFS_SUPER_BLOCK);
     if(bh == NULL){
        goto out_bad_sb;
     }

	/* TODO 2: interpret read data as minfs_super_block */
    ms = (struct minfs_super_block *) bh->b_data;

	/* TODO 2: check magic number with value defined in minfs.h. jump to out_bad_magic if not suitable */
    if(ms->magic != MINFS_MAGIC)
        goto out_bad_magic;

	/* TODO 2: fill super_block with magic_number, super_operations */
    s->s_magic = MINFS_MAGIC;
    s->s_op = &minfs_ops;

	/* TODO 2: Fill sbi with rest of information from disk superblock
	 * (i.e. version).
	 */
     sbi->version = ms->version;
     sbi->imap = ms->imap;

	/* allocate root inode and root dentry */
	/* TODO 2: use myfs_get_inode instead of minfs_iget */
	root_inode = minfs_iget(s, MINFS_ROOT_INODE);
	if (!root_inode)
		goto out_bad_inode;

	root_dentry = d_make_root(root_inode);
	if (!root_dentry)
		goto out_iput;
	s->s_root = root_dentry;

	/* Store superblock buffer_head for further use. */
	sbi->sbh = bh;

	return 0;

out_iput:
	iput(root_inode);
out_bad_inode:
	printk(LOG_LEVEL "bad inoden");
out_bad_magic:
	printk(LOG_LEVEL "bad magic numbern");
	brelse(bh);
out_bad_sb:
	printk(LOG_LEVEL "error reading buffer_headn");
out_bad_blocksize:
	printk(LOG_LEVEL "bad block sizen");
	s->s_fs_info = NULL;
	kfree(sbi);
	return ret;
}

static struct dentry *minfs_mount(struct file_system_type *fs_type,
		int flags, const char *dev_name, void *data)
{
	/* TODO 1: call superblock mount function */
    return mount_bdev(fs_type, flags, dev_name, data, minfs_fill_super);
}

static struct file_system_type minfs_fs_type = {
	.owner		= THIS_MODULE,
	.name		= "minfs",
	/* TODO 1: add mount, kill_sb and fs_flags */
    .mount      = minfs_mount,
    .kill_sb    = kill_block_super,
    .fs_flags   = FS_REQUIRES_DEV,
};

static int __init minfs_init(void)
{
	int err;

	err = register_filesystem(&minfs_fs_type);
	if (err) {
		printk(LOG_LEVEL "register_filesystem failedn");
		return err;
	}

	return 0;
}

static void __exit minfs_exit(void)
{
	unregister_filesystem(&minfs_fs_type);
}

module_init(minfs_init);
module_exit(minfs_exit);

测试脚本

root@qemux86:~/skels/filesystems/minfs/user# ./test-minfs.sh 
+ insmod ../kernel/minfs.ko
+ mkdir -p /mnt/minfs
+ ./mkfs.minfs /dev/vdb
+ mount -t minfs /dev/vdb /mnt/minfs
+ grep minfs
+ cat /proc/filesystems
        minfs
+ grep minfs
+ cat /proc/mounts
/dev/vdb /mnt/minfs minfs rw,relatime 0 0
+ stat -f /mnt/minfs
  File: "/mnt/minfs"
    ID: 0        Namelen: 255     Type: UNKNOWN
Block size: 4096      
Blocks: Total: 0          Free: 0          Available: 0
Inodes: Total: 0          Free: 0
+ cd /mnt/minfs
+ ls -la
Read a.txt from folder /, ctx->pos: 0
Found entry a.txt on position: 0
getting entry: name: a.txt, ino: 1
looked up dentry a.txt
-rw-r--r--    1 root     root             0 Feb 29 19:12 a.txt
+ cd ..
+ umount /mnt/minfs
released superblock resources
+ rmmod minfs
root@qemux86:~/skels/filesystems/minfs/user#

root@qemux86:~/skels/filesystems/minfs/user# ./test-minfs-0.sh 
+ insmod ../kernel/minfs.ko
+ mkdir -p /mnt/minfs
+ ./mkfs.minfs /dev/vdb
+ mount -t minfs /dev/vdb /mnt/minfs
+ cat /proc/filesystems
nodev   sysfs
nodev   rootfs
nodev   ramfs
nodev   bdev
nodev   proc
nodev   tmpfs
nodev   devtmpfs
nodev   binfmt_misc
nodev   configfs
nodev   debugfs
nodev   tracefs
nodev   sockfs
nodev   pipefs
nodev   rpc_pipefs
nodev   devpts
        ext3
        ext2
        ext4
nodev   nfs
nodev   nfs4
nodev   nfsd
        minfs
+ cat /proc/mounts
/dev/root / ext4 rw,relatime 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
configfs /sys/kernel/config configfs rw,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=56384k,nr_inodes=14096,mode=755 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /var/volatile tmpfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
192.168.43.92:/home/acat/softwares/linux-master/tools/labs/skels /home/root/skels nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,proto=tcp,port=2049,timeo=600,ret0
/dev/vdb /mnt/minfs minfs rw,relatime 0 0
+ umount /mnt/minfs
released superblock resources
+ rmmod minfs
root@qemux86:~/skels/filesystems/minfs/user#

root@qemux86:~/skels/filesystems/minfs/user# ./test-minfs-1.sh 
+ insmod ../kernel/minfs.ko
+ mkdir -p /mnt/minfs
+ ./mkfs.minfs /dev/vdb
+ mount -t minfs /dev/vdb /mnt/minfs
+ cd /mnt/minfs
+ ls -la
Read a.txt from folder /, ctx->pos: 0
Found entry a.txt on position: 0
getting entry: name: a.txt, ino: 1
looked up dentry a.txt
-rw-r--r--    1 root     root             0 Feb 29 19:16 a.txt
+ cd ..
+ umount /mnt/minfs
released superblock resources
+ rmmod minfs
root@qemux86:~/skels/filesystems/minfs/user#

root@qemux86:~/skels/filesystems/minfs/user# ./test-minfs-2.sh 
+ insmod ../kernel/minfs.ko
+ mkdir -p /mnt/minfs
+ ./mkfs.minfs /dev/vdb
+ mount -t minfs /dev/vdb /mnt/minfs
+ cd /mnt/minfs
+ touch b.txt
looked up dentry b.txt
new file inode created (ino = 2)
+ echo OK. File created.
OK. File created.
+ cd ..
+ umount /mnt/minfs
mode is 100644; data_block is 4
wrote inode 2
released superblock resources
+ mount -t minfs /dev/vdb /mnt/minfs
+ grep b.txt
+ ls /mnt/minfs
Read a.txt from folder /, ctx->pos: 0
Found entry a.txt on position: 0
getting entry: name: a.txt, ino: 1
looked up dentry a.txt
Read b.txt from folder /, ctx->pos: 1
Found entry b.txt on position: 1
getting entry: name: b.txt, ino: 2
looked up dentry b.txt
b.txt
+ echo OK. File b.txt exists 
OK. File b.txt exists 
+ umount /mnt/minfs
released superblock resources
+ rmmod minfs
root@qemux86:~/skels/filesystems/minfs/user#

最后

以上就是时尚热狗最近收集整理的关于文件系统驱动Part1Part2的全部内容，更多相关文件系统驱动Part1Part2内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：Linux
浏览次数：104 次浏览
发布日期：2023-06-11 05:20:02
本文链接：https://www.kaopuke.com/article/k-p-k_14_ujokf2_13_j_6_3.html