Hard Disk Drives

Prev: io-devices Next: redundant-arrays-of-inexpensive-disks-raids

The Interface

A disk can be viewed of as a set of 512-byte sectors.

Each read/write to a section is considered atomic – if a large section is written to, but there’s a failure, it may only succeed partially.

Basic Geometry

A simple disk might look like a platter, a circular hard surface where data is stored by inducing magnetic changes to it.

The drive is coded with a thin magnetic layer which enables the drive to persistently store bits even with the drive is off.

A spindle binds the platters together, which has a certain RPM (rotations per minute).

Data is encoded in concentric circles of sectors, called a track.

A Simple Disk Drive

A disk head, which reads and writes to and from the disk is attached to the spindle.

Single-track Latency: The Rotational Delay

Imagine a circle with the sectors numbering the disk like a clock (0-11). Any write, at worst case, requires N/2 or rotational delay. If we’re on 0, the worst case read/write location would be at 6, or vice-versa.

Multiple Tracks: Seek Time

A real life hard disk would have multiple tracks, instead of just one. The tracks would be layered on top of each other. When the arm has to move from one track to the next, it does a seek, where it moves from tracks. This can take a lot of time (0.5 to 2 ms), as the drive must find the correct track to then rotate to.

Some Other Details

Many drives use track skew to make sure that sequential reads can be properly serviced when crossing track boundaries.

As well, some disks are multi-zoned, where zones are consecutive to service faster reads and writes.

As well, disks have a cache, which take care of temporal and space locality (if you read one sector, you’ll probably read the adjacent ones), so disks can populate cache with neighbors.

On writes, the drive can either populate the cache first, or the cache afterwards. Writing to the cache first, then the disk makes writes look faster, but can lead to problems (if the power goes out, an acknowledged write can be lost).

I/O Time: Doing The Math

Disk I/O can be calculated with the following:

I/O Time = Seek Time + Rotation Time + Transfer Time

The rate of I/O can be calculated by:

I/O Rate = Size / I/O Time

Let’s do the math on two Hard Disk Drives from Seagate:

	Cheetah 15K.5	Barracuda
Capacity	300 GB	1 TB
RPM	15,000	7,200
Average Seek	4 ms	9 ms
Max Transfer	125 MB/s	105 MB/s
Platters	4	4
Cache	16 MB	16/32 MB
Connects via	SCSI	SATA

Assume that a 4KB read occurs at a random location on disk, for the cheetah:

Seek Time = 4 ms, Rotation Time = 2 ms, Transfer Time = 30 microseconds

This is about 6 ms. A random I/O should thus take about 6 ms for the Cheetah, and 13.2 ms for the Barracuda.

A sequential workload might look like this:

For a 100 MB transfer, with an 800 ms write speed, the throughput is 125 MB/s.

Computing the “Average” Seek

The average seek is about 1/3 the seek distance.

Disk Scheduling

Because I/O has a high cost, the OS will try to schedule it like processes. But since it also knows the size of the I/O request, it can try to greedily pick the shortest jobs for maximal efficiency.

SSTF: Shortest Seek Time First

One early disk scheduling approach is Shortest Seek Time First, which prioritizes writes to the same/nearest tracks. However, it has two problems:

The OS doesn’t understand tracks (it can use nearest-block-first, NBF).
Some requests may starve (if there are lots of requests to the same track, then further away tasks would not get scheduled.

Elevator (SCAN or C-SCAN)

Another algorithm that avoids starvation is SCAN, which queues up I/O requests, and then does a single pass across the disk (a sweep). It then repeats.

One variant is C-SCAN, or Circular SCAN. The algorithm sweeps from outer-to-inner, and then restarts.

This is known as the elevator algorithm, as it goes from the bottom to the top, then repeats.

SPTF: Shortest Positioning Time First

Instead of SSTF, which assumes that seeking is slower than rotational delay, we can do SPTF, which uses more detailed information to figure out the distance of each rotation + seek, and maximizes the output of I/O.

Other Scheduling Issues

Some other scheduling issues involve when to merge I/O requests for increased efficiency, like if there are these requests:

read block 33
read block 8
read block 34

The scheduler would be better served merging 1 and 3 to get this result:

read block 33, 34
read block 8

As well, there’s an issue of work-conserving vs anticipatory disk scheduling, where a disk might wait for newer I/O requests so as to get more optimal requests to batch up.