User:Ajgenius/GNUSystems

As part of the knowledge base of systems design, I would eventually like to see a series of booklets, covering the history and rationale leading to the current GNU & BSD systems, the places that those systems fall apart given modern considerations, where and why the specifics of the original system no longer apply, in the face of the original philopshy etc, and how to potentially move forward without losing the lessons of the past, but not holding on to things merely because they have always been done a certain way either. This is the initial working toward a part one - where we have come from, and why we are the way we are, i.e. - Understanding the system as it is.

=On understanding a GNU (especially Linux) system= Personal note: Currently the following is incomplete, kinda sucks, doesn't always make sense, and isn't always accurate. -aj-

In order to understand GNU on a system level, it is necessary to have at least a basic understanding of UNIX philosophy, and to some extent history, to help make sense of traditions, standards, and primary system design intents and rationale. So the following will attempt to guide through the overarching concepts, in brief, to build a cohesive view of the system.

The C Programming Language
To fully appreciate the nature of UNIX systems and derivatives, it is important to understand some basic concepts about its underpinnings, including the languages upon which components have been built. And as far as core components are concerned, the C Programming language is the definitive foundation of the majority of the system, followed to varying degrees by Shell Scripts, and in the case of GNU systems, Perl Scripts and C++.

Unlike more modern languages, C is merely a thin wrapper around Assembly, with various syntactic features added over the years to help make it more manageable. It has on its own, almost no features, beyond the barest minimum of math operators, and functions, pointers, and the ability to inline assembly code. While it has a type system, all core types are integers, of some sort or another - a void is a 0 length integer, a character is a one byte integer, a short is a 2 byte integer, etc. where there is also a float, and a double, for floating point integers. What this means is, on its own, without other supplements, C is more of a container for short assembly scripts, and functions to perform basic math, than anything else.

Therefore, in order to make it useful, a few things have had to be developed over the years. First and most importantly, is the preprocessor. The C preprocessor, was developed side by side with the original c language, in order to make it easier to use. It is in fact, nothing more than a simple tool for substituting one thing with another. It reads through a source file, and upon certain character patterns, replaces the patterns with something else. The most important of these is the #include pattern, which informs the preprocessor to include another file. This allows shared functions to be declared in one file, which is then included in multiple other files, to remove the limitation of everything being defined in a single source file. This allows for code to be built upon shared include files, which declare functions, to allow for the second most important addition to the C language - the c library. The C library provides a suite of common operators required to use C, without constantly falling back on direct inline assembly. It does this by having functions which directly use assembly, and then providing a shared suite of include files declaring them, so that other source can merely include the headers, to call them. These headers provide the fundamental building blocks required for memory operations, basic input/output, file management, and so forth. Between the preprocessor and the c library, C can be used for programs which require raw system calls, pointer operators, and i/o, without resorting to direct non-portable Assembly language. For this reason, C allows for core components to be built without using Assembly, and allowing for basic code reuse through shared functions. (It is of note that C also has had added to it over the years, mostly quite early on, "structs" which allow for property bags of multiple variables, and enum's which allow for choosing between a predefined set of named integer values, along with a few misc other minor extensions that have allowed for yet further simplification of otherwise complex functionality.)

What is most important about the result of the use and nature of the C language is the concept of shared functions, through inclusion. Virtually every program in a standard system includes the C library, even if they do not use it directly. Often even if they are not written in C languages, they still do, due to implicit dependencies - all important core components are written in C, and depend on the c library for shared functionality, therefore anything which depends on any core component, also depends on the C library.

The Shell
Overtime time, by necessity, a modular approach to system components was developed, due to the nature of the capabilities of the languages written in. Early on programs were written in direct assembly, then later rewritten in C, but in both cases every program "contained" the same set of functions because there was no other way to reuse functionality than to literally include it, effectively duplicating it by bundling it inside every program that used it. What this meant especially on older systems, was that because space was valuable, every system component had to have a single specific functionality, so that there was no excessive waste of space due to even more duplication of purpose or design. For similar reasons, because of the limited availability of storage at the time, commands were limited to short names, because every character added up quickly.

The natural outcome of this design, was the simple Shell interface, called sh, which was effectively the user interface and which for all intents and purposes, did nothing more than run other commands that the user typed in. For example there was one command to list files, one to copy files, one to move them, and another to remove them (called, ls, cp, mv, and rm, respectively). Each component had no code related to anything beyond its specified purpose.

The need to overcome the seeming limitations of only having single use programs resulted after a few years in the designing of the piping paradigm - using "pipes" one could pipe the output from one program, to the input of another, thus allowing one to have a series of specific purpose commands, which could be used together, without having to include duplicate functionality in both programs. Instead one merely had to run the commands in order of usage with an intermediate special file called naturally, a pipe, so that each program could use the basic i/o functionality they included from the c library, to process information in the order defined. This series of commands, called a chain, was handled by the shell program, by use of a special pipe command, which let it know to pipe output of the first command, to the second command, rather than simply running them. Similarly by use of different operations, one could pipe output directly to a file, instead of to another program or printing to the screen.

Out of this piping paradigm, the UNIX core system developed along very specific lines, so that most important commands could become part of a chain command, to allow complex operations with a single line of input, all while maintaining the modularity and highly specialized purposes of individual commands. For example most systems would include a streaming editor, to allow for piping text through, applying operations to modify text, and piping back out. Or another example, the less program, which would allow for displaying large amounts of content in manageable chunks, one screen full at a time, waiting for enter or arrow key to continue to the next chunk of text.

Perhaps the most important command to be made chain-able however, is the shell program itself. By allowing for piping into a shell, one could save common complex commands chains or series of commands, by placing them into a file, and then piping the file into a new instance of the shell at a later time as needed, thereby allowing common operations to be automated, yet without the need for wasting precious system resources by creating a custom program. Over time this ability to script shell commands grew to allow for more then merely playback of a series of commands, by enabling the ability to pass parameters to the script itself, and by the shell having simple variables, if statements, and loops built-in.

The sum result of this simple system, developed progressively by many people over the course of several years, is small clearly defined components only capable of the most basic of tasks, yet which combined allow for highly complex operations which can be run manually, or recorded and run in batches with only minimal space requirements.

The Philosophy
Out of the limitations, and subsequent modular design, was developed this philosophy of system design that reflected the nature of the system, and defined future developments for a long time to come. This philosophy was possibly summed up best by Doug Mcllroy, the inventor of the piping paradigm, as follows -


 * "This is the Unix philosophy:
 * Write programs that do one thing and do it well.
 * Write programs to work together.
 * Write programs to handle text streams, because that is a universal interface."