So I've just returned from a week long vacation to sunny Kos Island in Greece. As I lay beside the swimming pools of the resort I was staying, I found that I had loaded two particular books onto my Kindle. These books were “ The C Programming Language” by Kernigan and Ritchie, and “ Advanced Programming in the UNIX environment” by Stevens and Rago.

As a long time Linux user and programmer, I have always been intrigued by the C programming language. I'm no novice to programming languages, and I am quite comfortable with Python, Java and PHP. But even though I know at least one person that can use C to great effect, and are able to do (for me anyway) complicated stuff with it, for me personally it hasn't really amounted to anything more than a simple Arduino program. I did create a patch set for the pmacct project, but that was simply that, patching existing code.

The C Programming Language

So, as I was spending my week at the pool, I had plenty of time to read the two books I mentioned earlier. First up was “The C Programming Language”, second edition by Brian W. Kernigan and Dennis Ritchie (which I will just call K&R from now on). The book describes the C programming language as it was at the time that the ANSI C standard was completed in 1988, and has not been updated since then (apart from the ebook edition I have). Even though the book might appear aged based on that, I was amazed on how easy to read it was. The style is compact and succinct, and gradually eased the reader into the concepts of C. The book does not explain in detail what functions and variables are, and assumes that the reader already has knowledge of programming concepts. The book starts out with the, then new, now common “Hello World” example:

#include <stdio.h>

main()
{
    printf("hello, worldn");
}

And from there carries on to talk about function parameters; call by value, character arrays, like strings, but not really and object types, of which there are only a few: integers and floating points of different sizes, structs and unions, and pointers to them. From there the more difficult concepts are introduced: program structures; preprocessor statements and header files; and pointers and arrays. The book closes by discussing input and output concepts using standard input/output and files, and finishes by briefly discussing UNIX system calls and their use.

Never while reading K&R did I feel overloaded by information. Indeed, after reading the book I had a feeling I could finally grasp the concepts of pointers and their use, and the memory management in C. But what the book did cover was how to interface with the operating system. The only library of functions that the book covered was the ANSI C Standard Library, which is surprisingly small by design.

A very abridged history on UNIX

The first edition of K&R was released in 1978, and at that point in time Version 7 UNIX came out, along with a handful of derivatives. This and earlier versions of UNIX were created primarily by Ken Thompson, and Dennis Ritchie. UNIX was written in C since 1972. The first version was actually written in PDP-11 assembly. Only after they converted UNIX into C were they able to create a version that was sufficiently portable to run on other hardware.

This eventually led to System 3 and System V UNIX, and BSD. System 3 and System V UNIX led to a series of proprietary UNIX's, one of which is Solaris. BSD was eventually forked into FreeBSD, OpenBSD and NetBSD. Relative newcomer Linux came much later, and has no direct roots in UNIX. Darwin as used by Mac OSX can find its roots in FreeBSD and NetBSD, and BSD in general.

Standardisation and POSIX

From this wild growth of UNIX derivatives and versions grew a need to provide for a common standard, to allow for portability of programs between different vendors. After ANSI C was ratified by ISO as ISO C, there was now a common language that developers could use which was described in detail, first in the K&R book and later in the ANSI/ISO standard. Around the same time, work was started to standardise the features that every UNIX OS should support. This standard is called POSIX. The standard describes low-level subjects like the C programming language (the ISO C standard is also part of POSIX), processes, floating point operations, pipes and threads. Some higher level features are also described, like a command line interface.

Because POSIX is strongly linked with the C language, a lot of the standards in POSIX can be expressed as functions. The POSIX standards can be expressed in C constants, definitions and functions, and indeed have been done so. POSIX described interfaces, which are implemented with actual code in a C library. A examples of this are GNU libc (for Linux) and BSD libc (for BSD). Some some libc implementations can find their roots all the way back to AT&T UNIX, like libc in Solaris. Even though Microsoft Windows is not POSIX compliant, but it is possible to write ANSI C programs with it. It provides compiler flags to check for ANSI compatibility, and the C runtime documentation contains mentions which parts are ANSI C compliant or Microsoft proprietary.

How do I POSIX?

Enough about UNIX and POSIX history. The second book I read on my vacation was “ Advanced Programming in the UNIX Environment”, third edition by W. Stevens Richards and Stephen A. Rago (which I will call APUE after this). After the last two sections, you can now probably guess what this book is about.

The book describes the POSIX.1 standard, and basically continues where K&R left off. APUE describes all the common POSIX subsystems, like networking using sockets, file handling, signals and users. It describes all the POSIX.1 standards, and references the implementations of the respective libc implementations of Linux 3.2, FreeBSD 8.0, Mac OSX 10.6.8 and Solaris 10. It doesn't just describe the libraries themselves, but also dives into the quirks of respective UNIX's, what to avoid when writing truly portable code, and at times also comments on the performance of a particular implementation. For instance, the sections about file handling describe a few methods, like using when to use buffers or not, their size, and the impact of those choices. The chapters about Inter-Process Communication (IPC) describes the differences between using pipes, FIFOs, message queues and sockets.

What now

The APUE book is a treasure trove of information about the depths of any UNIX, and is hugely informative. It does differ from K&R in one aspect: it is a bit dry. Whereas I read K&R is practically one sitting, I have yet to finish the APUE book. This is in part because the APUE is a lot longer than the K&R book, about four times longer in fact. I'm currently in chapter 17 of 21, so I'll probably finish it soon though.

So what now? Now I can C. At least I think I can. In the next few weeks I'll be trying my hand at writing some simple C programs to try and get the hang of it. One goals is that I want to convert my imagecat program into C, so I can compare the performance of Python and C for this particular use case. At the very least, it'll provide a good test case, and I won't have to invent an entirely new program.