Scale and Performance in a Distributed File System (Andrew)

J.H. Howard et al. ACM Transactions on Computer Systems. Vol. 6, No. 1 (Feb 1988), pp. 51-81.

They describe Andrew and design decisions that went into it. They talk about performance with N servers and M users.

The design of Andrew:

On each client, a "Venus" process runs which handles remote file requests by talking to "Vice" servers. Venus also does caching on a local disk and in memory.
Whole files are always shipped over. Bad for accessing large files; can't even access ones bigger than local disk size. However, a lot less network traffic and server load.
Cache consistency is done by server callbacks; local changes are not visible until the file is closed. Directories are cached, but directory changes and certain metadata changes (e.g., permission) are sent directly to the server.
They moved to using inode-like things instead of just names so that clients could do pathname resolution.

They introduce the Andrew Benchmark which is used countless times in subsequent papers. It's basically copying a bunch of source files, reading them all through, and compiling them.

It scales better than NFS. Lower server CPU load (not to mention actually consistency semantics). But still the large-file problem: what about database-type applications?

Other nice features: Volumes/mount points (like NFS/vnodes) allow migration. Supports snapshots (with copy-on-write) for backups and migration.

Umesh Shankar

Last modified: Wed Jul 11 11:52:09 PDT 2001