Architecture of Open Source Applications, Vol. 1

Dated Jan 19, 2026; last modified on Mon, 19 Jan 2026

The Architecture of Open Source Applications. Volume 1. aosabook.org . Accessed Jan 19, 2026.

Asterisk. A server application for making, receiving, and performing custom processing of phone calls.

Audacity. A popular sound recorder and audio editor. One goal is that its user interface should be discoverable: people should be able to sit down without a manual and start using it right away.

The Bourne-Again Shell. Input processing, parsing, the various word expansions and other command processing, and command execution, from the pipeline perspective.

Berkeley DB. A software library that provides fast, flexible, reliable and scalable data management. A collection of modules, each of which embodies the Unix “do one thing well” philosophy.

CMake. A replacement for the aging autoconf/libtool approach to building software. A build system that had to be easy to use, and allow for the most productive use of the researchers' programming time.

Eclipse. The evolution of the architecture of the Eclipse SDK within Eclipse and Runtime Equinox projects. Implementing software modularity and interoperability with a large code base written by a diverse community.

Graphite. Storing numbers that change over time and graphing them. Providing the aforementioned functionality as a network service that is both easy to use and highly scalable. A specialized database library and its storage format. A caching mechanism for optimizing I/O operations. A simple yet effective method of clustering Graphite servers.

The Hadoop Distributed File System. Store very large data sets reliably, and stream them at high bandwidth to user applications. Report on an experience using HDFS to manage 40 petabytes of enterprise data at Yahoo!

Continuous Integration. Common sets of features implemented in CI systems, available architectural options and their impact on feature feasibility. Buildbot, a master/slave system. CDash, a reporting serve model. Jenkins, a hybrid model. Pony-Build, a Python-based decentralized reporting server.

Jitsi. An application that allows people to make video and voice calls, share their desktops, and exchange files and messages. How to support this over a number of different protocols, e.g., the standardized XMPP (Extensible Messaging and Presence Protocol), SIP (Session Initiation Protocol), proprietary protocols like Yahoo! and Windows Live Messenger (MSN).

LLVM. Design decisions that shaped the LLVM umbrella project that hosts and develops a set of close-knit low-level toolchain components (e.g., assemblers, compilers, debuggers, etc.), which are designed to be compatible with existing tools typically used on Unix systems.

Mercurial. A modern distributed version control system, written mostly in Python with bits and pieces in C for performance. Decisions involved in designing Mercurial’s algorithms and data structures.

The NoSQL Ecosystem. Understand the space of available NoSQL tools and how the design of each one explores the space of data storage possibilities.

Python Packaging. Original intent was to make the “multiple dependencies for each install” philosophy as developer-, admin-, packager-, and user-friendly as possible. Reinventing it to solve problems like unintuitive version schemes, mishandled data files, difficulty repackaging, etc.

Riak and Erlang/OTP. Riak is a distributed, fault tolerant, open source database that illustrates how to build large scale systems using Erland/OTP.

Selenium WebDriver. A browser automation tool, commonly used for writing end-to-end tests of web applications. Provides APIs in a variety of languages to allow for more control and the application of standard software development practices.

Sendmail. A Mail Transfer Agent, i.e., the software that actually transfers the mail from the sender to the recipient. The first MTA on the Internet, and still the most prevalent.

SnowFlock. With SnowFlock’s VM Cloning, resource allocation, cluster management, and application logic can be interwoven programmatically and dealt with as a single logical operation. How can VM Cloning be effectively interwoven in several programming models and frameworks? How can it be implemented to keep application runtime and provider overhead to a minimum? How can it be used to create dozens of new VMs in 5s or less?

SocialCalc. Combining the authoring ease and multi-person editing of wikis with the familiar visual formatting and calculating metaphor of spreadsheets.

Telepathy. A modular framework for real-time communications that handles voice, video, text, file transfer, and so-on, as a service. Communications as a service allows breaking communications out of a single application. Telepathy makes extensive use of the D-Bus messaging bus and a modular design.

Thousand Parsec. A set of standard specifications for a game protocol and other related functionality, allowing diverse implementations of client, server, and AI software, as well as a vast array of possible games.

Violet. A simple UML editor that is (a) useful to students and (b) an example of an extensible framework that students can understand and modify.

VisTrails. An open-source system that supports data exploration and visualization. Its provenance infrastructure maintains a detailed history of the steps followed and data derived in the course of an exploratory task. Also provides annotation capabilities for users to enrich the automatically-captured provenance.

The Visualization Toolkit. A system for data processing and visualization used in scientific computing, medical image analysis, computational geometry, rendering, image processing, and informatics. Choice of copyright license should be made after reviewing the goals of the project.

Battle for Wesnoth. A development model for an open-source project aimed at enhancing accessibility so that volunteers with widely different skills to interact in a productive way.