磁盘的调度算法有多种,先来先服务(First Come,First Server,FCFS),最短寻道优先(Shortest Seek Time First,SSTF),扫描算法SCAN等等。


The Schedulers

There are currently 4 available:

  • Noop Scheduler
  • Anticipatory IO Scheduler ("as scheduler")
  • Deadline Scheduler
  • Complete Fair Queueing Scheduler ("cfq scheduler")

Noop Scheduler

This scheduler only implements request merging.

NOOP算法的全写为No Operation。该算法实现了最最简单的FIFO队列,所有IO请求大致按照先来后到的顺序进行操作。之所以说“大致”,原因是NOOP在FIFO的基础上还做了相邻IO请求的合并,并不是完完全全按照先进先出的规则满足IO请求。NOOP假定I/O请求由驱动程序或者设备做了优化或者重排了顺序(就像一个智能控制器完成的工作那样)。在有些SAN环境下,这个选择可能是最好选择。Noop 对于 IO 不那么操心,对所有的 IO请求都用 FIFO 队列形式处理,默认认为 IO 不会存在性能问题。这也使得 CPU 也不用那么操心。当然,对于复杂一点的应用类型,使用这个调度器,用户自己就会非常操心。

Anticipatory IO Scheduler ("as scheduler")

The anticipatory scheduler is the default scheduler in older 2.6 kernels – if you’ve not specified one, this is the one that will be loaded. It implements request merging, a one-way elevator, read and write request batching, and attempts some anticapatory reads by holding off a bit after a read batch if it thinks a user is going to ask for more data. It tries to optimise for physical disks by avoiding head movements if possible – one downside to this is that it probably give highly erratic performance on database or storage systems.

CFQ和DEADLINE考虑的焦点在于满足零散IO请求上。对于连续的IO请求,比如顺序读,并没有做优化。为了满足随机IO和顺序IO混合的场景,Linux还支持ANTICIPATORY调度算法。ANTICIPATORY的在DEADLINE的基础上,为每个读IO都设置了6ms 的等待时间窗口。如果在这6ms内OS收到了相邻位置的读IO请求,就可以立即满足 Anticipatory scheduler(as) 曾经一度是 Linux 2.6 Kernel 的 IO scheduler 。Anticipatory 的中文含义是”预料的, 预想的”, 这个词的确揭示了这个算法的特点,简单的说,有个 IO 发生的时候,如果又有进程请求 IO 操作,则将产生一个默认的 6 毫秒猜测时间,猜测下一个 进程请求 IO 是要干什么的。这对于随即读取会造成比较大的延时,对数据库应用很糟糕,而对于 Web Server 等则会表现的不错。这个算法也可以简单理解为面向低速磁盘的,因为那个”猜测”实际上的目的是为了减少磁头移动时间。 

Deadline Scheduler

The deadline scheduler implements request merging, a one-way elevator, and imposes a deadline on all operations to prevent resource starvation. Because writes return instantly within linux, with the actual data being held in cache, the deadline scheduler will also prefer readers – as long as the deadline for a write request hasn’t passed. The kernel docs suggest this is the preferred scheduler for database systems, especially if you have TCQ aware disks, or any system with high disk performance.


FIFO(Read) > FIFO(Write) > CFQ 

deadline 算法保证对于既定的 IO 请求以最小的延迟时间,从这一点理解,对于 DSS 应用应该会是很适合的。

Complete Fair Queueing Scheduler ("cfq scheduler")

The complete fair queueing scheduler implements both request merging and the elevator, and attempts to give all users of a particular device the same number of IO requests over a particular time interval. This should make it more efficient for multiuser systems. It seems that Novel SLES sets cfq as the scheduler by default, as does the latest Ubuntu release. As of the 2.6.18 kernel, this is the default schedular in kernel.org releases.

CFQ算法的全写为Completely Fair Queuing。该算法的特点是按照IO请求的地址进行排序,而不是按照先来后到的顺序来进行响应。 
Completely Fair Queuing (cfq, 完全公平队列) 在 2.6.18 取代了 Anticipatory scheduler 成为 Linux Kernel 默认的 IO scheduler 。cfq 对每个进程维护一个 IO 队列,各个进程发来的 IO 请求会被 cfq 以轮循方式处理。也就是对每一个 IO 请求都是公平的。这使得 cfq 很适合离散读的应用(eg: OLTP DB)。我所知道的企业级 Linux 发行版中,SuSE Linux 好像是最先默认用 cfq 的. 

Changing Schedulers

The most reliable way to change schedulers is to set the kernel option ‘elevator’ at boot time. You can set it to one of "as", "cfq", "deadline" or "noop", to set the appropriate scheduler.

It seems under more recent 2.6 kernels (2.6.11, possibly earlier), you can change the scheduler at runtime by echoing the name of the scheduler into /sys/block//queue/scheduler, where devicename is the base name of the block device, eg sda for /dev/sda

Which one should I use?

I’ve not personally done any testing on this, so I can’t speak from experience yet. The anticipatory scheduler will be the default one for a reason however – it is optimised for the common case. If you’ve only got single disk systems (ie, no RAID – hardware or software) then this scheduler is probably the right one for you. If it’s a multiuser system, you will probably find cfq or deadline providing better performance, and the numbers seem to back deadline giving the best performance for database systems.

Tuning the IO schedulers

The schedulers may have parameters that can be tuned at runtime. Read the linux documentation on the schedulers listed in theReferences section below

More information

Read the documents mentioned in the References section below, especially the linux kernel documentation on the anticipatory and deadline schedulers.

