This paper was written by Ritchie and Thompson, presented for the ACM, detailing v3 unix.
The Unix Time-Sharing system is a general-purpose, multi-user, interactive operating system for the DEC PDP-11.
The kicker? It runs on a PDP-11, costing as little as $40,000.
Major programs include: assembler, ed, a linker, debugger, a C compiler, a Basic interpreter, a formatter, a Fortran compiler, a Snobol interpreter, Top-down compiler compiler (TMG), a bottom-up compiler-compiler (YACC), form letter generator, macro processor (M6) and permuted index program.
The PDP-11 has a 16-bit word computer, with 144kb of core memory, which Unix occupies 42kb, with device drivers, I/O buffers, and system tables occupying another few 5kb.
The hard disk is a 1MB fixed-head disk, used for file system storage and swapping, with four moving head 2.5MB removable disks, and a single 40MB disk pack.
The most important job of Unix is to provide a file system. It provides:
Ordinary files are just characters demarcated by the newline character.
Directories introduce a structure on the file system as a whole. Each user in the OS has a home directory of their own files, and they can make subdirectories, which are all access controlled.
The system mains several directories for its own use, like system administration.
Search starting with ‘/’ starts at the root, while search without it starts in the user’s current directory.
A file may be linked to another location, where it is simply a pointer to its location (this is a symlink).
Each I/O device supported by unix is associated with one file – they are read and written to like ordinary disk files, but requests to read or write result in activation of the associated device.
For example, special devices reside in /dev
. For
example, punch paper tape (the printer) would be at
/dev/ppt
. You could then write to it “echo hi >
/dev/ppt” and that would write the bytes corresponding to ‘hi’ to that
device.
This allows three advantages:
Using the mount
system call, a user can turn a leaf (a
file) into a subtree that uses that device as storage.
The user can then choose to cd
into that subtree, and
use it as normal, except that the leaf file turns into the root
directory of the filesystem, and that it cannot reference files on other
mounted filesystems.
In the root directory of all file systems, ‘..’ refers to the directory itself, instead of its parent.
Each user of the system is assigned a unique user identification number. When a file is created, it is marked with that user id. Also, files are given seven protection bits – six of these specify read, write, and execute permission for the owner for the file and other users.
There is a concept of a “super user” which isn’t bound to the permissions setup of the normal users.
I/O calls entail open
, create
,
write
to modify files, which are represented by file
handles. Files have no limit on size nor any distinction on random or
sequential I/O. The file system also has no user-visible locks on the
file system, due to their overhead and the fact that users don’t
normally update other users files.
The read
system call returns the bytes transmitted,
which returns 0 when it hits the end of file marker.
To do random access I/O, you can take a file handle, a pointer, and
the seek
call to seek to a location in the file based on
the pointer, either positive or negative.
A directory entry only contains a name and a pointer to the file. The
pointer is an integer, called an i-number
, short for
index number
of the file. The i-node contains a description
of the file as follows:
Calls to open or create return a file pointer. When a file is created, an i-node is allocated for it, and a directory entry is made, which contains the name of the file and the i-node number. Making a link involves creating a directory entry with the new name, copying the i-number, and incrementing the link-count field of the i-node.
Removing a file involves decrementing the link-count of the i-node, and erasing the directory entry. If the link-count drops to 0, it is freed.
The filesystem is divided into 512-bit blocks, which are logically addressable.
The filesystem is fairly efficient – running an assembler program, the time was 35.9 seconds, with 63.5% overhead from assembler execution, 16.5% system overhead, and 20% disk wait time.
Each process gets three logical segments: the text segment begins at location 0 in the virtual address space. This is for putting compile-time data. This is write-protected, and a single copy of it is shared among all processes executing the same data.
The next 8kb are allocated for the stack.
The next segment is a non-shared, writable data segment, which can be expanded with a system call (this is the heap).
Processes are created with the fork
system call. This
splits the process into two paths. Control returns from the fork to the
parent, while in the child, control is passed to the location label.
Processes may communicate through each other with the
pipe
system call, which allows the output of one program to
be send as input to another process.
the execute
system call (execute(file, arg1,
arg2,…argn)) can be used to execute a file in the current process.
Process synchronization is done with the wait
system
call. Wait returns the processid of the terminated process. An error
return is generated if the calling process has no descendants.
exit
terminates a process, destroys its image, and
closes its open files.
The shell opens up two file descriptors, 0
for reading
user input, and 1
, for writing to. These two files can be
read and written to by programs in the shells lifecycle.
The shell can change the assignments of these file descriptors. For
example, >
changes the file descriptor 1
to
the file specified after >
. Likewise, <
redirects input to a process from the file specified after
<
. So ed < script
means that
ed
will read from the text of script
.
The |
character is a filter, which is used to direct
output from one command to the input of another.
Commands may be separated by semicolons. The &
character returns the process id to the shell, but executes the job in
the background (is non- blocking).
The shell mainly waits for a user to type a command.
When the new-line character ending the line is typed, the shell’s read call returns.
The shell analyzes the command line, putting the arguments in a form appropriate for execute.
Then fork is called, which generates a child process. That child
process is given the appropriate arguments. The parent process then
waits
for the child process to die. Then, it returns back
to the user.
If the &
character is provided, then it simply
doesn’t wait.
Since the child inherits from the parent process, this all works as expected.
Finally, when an argument with <
or >
is given, it is only necessary to close file 0
or
1
, and then open the output file. This is because the file
descriptor opened, by convention, is the lowest available number.
Filters are straightforward extensions of standard I/O redirection with pipes used, instead of files.
The shell itself is a child process of another process, called
init
. init
is used to set up devices with a
process. Init then waits for logins, where it reads a password file and
then changes to the user’s home directory, and executes the shell,
allowing the user to access commands.
This then loops.
The Unix System can also set up init to drop a specific user into a
different program than the shell, like ed
for text editing.
There are also games like chess, blackjack, and 3D tic-tac-toe.
The PDP-11 hardware detects a number of hardware faults, like
reference to nonexistent memory. Such faults cause the processor to trap
to a system routine. Normally, this results in the process being
terminated and the memory of the process written to a core
file in the current directory for debugging.
You can also use the quit signal to exit in the case that something goes wrong.
The UNIX OS is successful because of its low hardware requirements, and simple abstraction over devices as files. As well, since UNIX was dog fooded, most of its design mistakes were fixed as soon as they appeared.
The shell, the file system, and access system have been extremely useful for programmers to use.