(Originally written in 1984)

Languages and Software Development

Computer languages are generally organised to be a sub-set of conventional English (although I once encountered an IBM 5100 which spoke BASIC with a French accent) so that there are no new words to be learnt and the coding is not too mind-boggling.

Although English is a general-purpose language, most computer languages are designed to solve specific classes of problems. For example, COBOL is designed specifically for commercial accounting applications, while LISP is primarily a list processing language with strong ties to artificial intelligence applications.

Take a look at the languages family tree and you'll get some idea of the development of computer languages and how later developments derive from the earlier ones. At the root of the tree is assembly language, or assembler, which is a symbolic representation of the instructions actually executed by the computer CPU. There are, therefore, as many assembly languages as there are CPU designs.

Each line of assembly language is translated into one machine instruction. It follows from this that writing assembly language is a tedious and unproductive chore, since even the simplest tasks may require several hundred lines of assembler.

During the fifties, programmers at IBM realised this and set about designing a better way of writing computer programs.They realised that operations like addition were always performed the same way; surely it would be possible to get the machine itself to translate a line like

X = X + 1

into the appropriate assembly language code? Indeed, it was possible, and the result was Autocode, which was thought of at the time less as a language than as a way of pre-coding instructions. Autocode is now of historical interest; its main contribution was the contribution of technology and experience to FORTRAN, which is generally regarded as the first real computer language.

FORTRAN (which stands for FORmula TRANslator) was developed at IBM by John Backus. At the time, computers were mainly used for number-crunching in scientific calculations or for very large commercial accounting tasks, and FORTRAN was designed to meet the needs of the scientific and engineering community.

FORTRAN still survives today; it's probably more popular than ever. Of course, it has been overhauled a few times since 1960, and the well-written programs written in the latest FORTRAN-77 bear only passing resemblance to their earliest counterparts.

At the same time, the US Department of Defense was increasingly using computers for tasks like payroll accounting and quartermaster's inventories, and the specialists at DOD were concerned that every computer they bought came with its own language. As programmers went from project to project and machine to machine, they had to learn new languages. The search was on to design a new language.

The result was COBOL (Commercial and Business Oriented Language) which has become the standard for business data processing. The key designer behind it was Grace Murray Hopper, who as Commodore Hopper is the US Navy's most senior woman officer and its oldest serving officer, despite trying to retire several times. COBOL is a wordy language, but even novice programmers picking up a COBOL program can read it and quickly understand it, because of its self-explanatory nature.

For example, where FORTRAN would say

O = G - T

COBOL would have

SUBTRACT TAX FROM GROSS-PAY GIVING NETT-PAY

which is less cryptic.

At the same time, on the other side of the Atlantic the Europeans were also hard at work, and a committee of leading academics and representatives of leading computer manufacturers like Elliot and Ferranti was meeting to design a new language. This language was intended to be used for the expression of any algorithm - that is, any set of steps which lead to the solution of a problem. Its genesis was therefore more mathematical than pragmatic, as witnessed by the fact that the language has no input/output statements, while most real-world programs actually spend most of their time doing input/output.

The language was Algol (ALGOrithmic Language) and while it became popular in Europe, it never really caught on in the US. That's not to say that US programmers did not see its advantages; within a few years they were happily incorporating its structured programming techniques into new languages of their own.

By 1961, it became obvious that programmers were having to learn one language for commercial programming, another for scientific programming and yet another for general work like writing compilers. There had to be some way of writing a language which would incorporate the best features of all languages, allowing programmers to use just that language for all projects they worked on.

IBM, together with the two IBM user groups, SHARE/GUIDE, set out to tackle this problem, and came up with the answer: PL/I (Programming Language / One). With features borrowed from FORTRAN, COBOL and Algol, plus more than a few of its own, PL/I proved to be too big a language for any mortal to possibly remember. Nonetheless, it is still popular inside IBM, and stripped-down versions are catching on with minicomputer and microcomputer companies.

Another language emerged from IBM in the mid-sixties, to capture a specialised following. APL (A Programming Language) was designed by Ken Iverson for timeshared number-crunching tasks, particularly those involving manipulation of arrays and lists of data. Rather than using English words, APL has its own character set, which makes it look rather intimidating, and its unusual syntax can be mind-boggling to those raised on more conventional languages. It makes firm (indeed, proselytising) converts, though.

Also in the mid-sixties, and designed for time-shared use, came another language which has swept the world - well, the microcomputer part of it, at least. BASIC (Beginner's All-purpose Symbolic Instruction Code) was invented by Kemeny and Kurtz at Dartmouth College for the use of engineering and other students who needed an introduction to programming for the solution of small problems which did not justify mastering the complexities of FORTRAN FORMAT statements.

BASIC inherits a lot of its design from FORTRAN, without some of that language's complexities - at its simplest level it is really just FORTRAN with the addition of line numbers. Over the years, however, BASIC has gradually been extended and given more power, with the addition of new statements, new functions and features such as graphics. There is, in fact, some merit to the argument that BASIC has gone the way of PL/I and the later Algol 68: it is too big for anyone to remember all of it, certainly in casual use.

In the late sixties, the Algol Committee met again to update Algol. Wishing to correct for the over-simplistic (nonetheless powerful) nature of the earlier version, this time they threw in not only I/O statements but everything else they could think of, rather like PL/I. The result was Algol 68, a massive language with special statements for every circumstance.

While Algol 68 never really caught on, it is of vital importance to the micro community in one sense. For one member of the Algol Committee consistently argued against the construction of a huge language and in favour of a small but well-structured and consistent language that could be used to reproduce the special features of Algol 68. Incensed at the elephantine creation of the committee, this member determined to prove his point by writing his language.

The man was Niklaus Wirth, the language was Pascal. Firmly based in Algol, with only a few carefully-chosen features of its own, Pascal has proved to be a remarkably expressive language, suffering only from deficiencies in the I/O and system manipulation areas, most of them inherited from Algol.

Pascal added a number of important concepts to the computer linguist's arsenal, but the most important of these was the idea that the compiler should catch as many errors as possible before the program is actually run. The result of this philosophy is a language that many find over-restricting: Marvin Minsky of MIT refers to Pascal as 'a voluntarily worn strait-jacket'.

While Pascal has now fallen out of favour, it has made a contribution to a number of other languages. Wirth himself has dropped Pascal and developed a further language called Modula-2, which extends his concepts.

In the early seventies, researchers at Bell Laboratories developed a couple of languages which have now come to a position of great influence. Based on the BCPL language developed at Cambridge University, B was a small language which could be used at a low level as a replacement for assembly language. Its successor C, is a general-purpose language with the potential to replace FORTRAN and Pascal.

Just as Pascal was a reaction against another language, so is C. PL/I was used to write a large operating system called MULTICS, but at Bell Labs work was under way on a stripped down operating system called UNIX. C was developed in the same spirit, as a stripped-down language that could be used in the implementation of the operating system, its compilers and utilities. The leanness of both UNIX and C reflect the small group of developers: Thompson, Kernighan and Ritchie.

Algol 68 is important in another way: as a precursor of a new language called Ada. Ada, Countess Lovelace (Byron's daughter) was an associate of Charles Babbage, inventor of the Analytical Engine, and is generally credited with writing the first computer program in the 1830's. She is celebrated in the name of the US Department of Defence's new language. Ada follows in the footsteps of Algol 68 and PL/I as a large language but it has a rather more modern style and a number of new twists of its own, particularly in the areas of modular programming and inter- process communication (multiple programs communicating with each other).

Two other outgrowths of PL/I are important in the microcomputer world: PL/I Subset G and PL/M. PL/M was in fact the first high level language to run on a microprocessor; it was developed by Dr. Gary Kildall for Intel Corp. While it is a cut- down derivative of PL/I, specifically designed for microprocessor applications such as traffic light controllers, it is sufficiently powerful to allow construction of software tools such as the CP/M operating system and the CBASIC compiler which was used to write some of the first commercial software for micros. Without PL/M, micros might never have got off the ground.

PL/I Subset G is the minicomputer subset (ANSI X3J1) of the mainframe PL/I. A lot of dead wood has been cleared away, and what is left is a language remarkably like Pascal, but with none of the silly restrictions and with enough special facilities for real world programming - like formatted I/O for commercial programming and double-precision hyperbolic functions for the scientists.

Also shown as derived from PL/I is a box of database languages generally. This is not perhaps a direct derivation, but more of a philosophical contribution. Most special-purpose database languages require COBOL-like (hence PL/I-like) report formatting and string handling, coupled with modern control structures derived from Algol, via PL/I. Examples range from IBM's DL/I to dBASE II.

While these languages have a well-defined place in the family tree, there are a number of other machine tongues which have no obvious ancestry.

FORTH, for example, was developed by astronomer Dr. Charles Moore for use in radio-telescope research. It bears little relation to earlier languages in either its internal operation or its external appearance. A FORTH system starts with a dictionary of approximately 140 elementary words, and the user proceeds to define new words in terms of the old ones. At the top level, words are entire programs. Like APL, FORTH can be mind-boggling to the initiate, but breeds converts stronger than any other religions.

Smalltalk has been developed over many years at Xerox's Palo Alto Research Center (XPARC) as part of the long-term Dynabook project. In Smalltalk, objects are collections of apparently concurrently executing subroutines, and are controlled by sending messages to them; for example

BOX GROW 5

sends a message to BOX, instructing it to run its internal subroutine GROW (called a method) with the parameter 5. The likely effect of this is that a box on the screen will grow in size by 5 units.

Related to Smalltalk is Logo, which was developed at MIT by Seymour Papert. Logo is best known for its turtle graphics, which use an imaginary turtle on the screen to draw shapes.

PILOT is another language which was developed for educational applications by Dr. John Starkweather. It is a simple language primarily intended for drill and testing.

LISP (LISt Processing language) is an unusual language which views all data as lists of items. It is very simple, yet very powerful and has attracted attention in the artificial intelligence community because LISP programs are simply lists of items and can therefore modify themselves. LISP is also known as Lots of Insane Stupid Parentheses because of its extensive use of brackets.

Prolog is a logic manipulation language which is distantly related to LISP and which can maintain a database of data items and rules relating them, and can them answer questions about its database. Currently Prolog programmers are in great demand in Japan, where it is a cornerstone of the Fifth Generation artificial intelligence project.

Prolog (PROgramming in LOGic) is based on first-order predicate logic. The program starts with a statement which is to be proved, and continues with propositions which will be used to assert the truth or falsehood of the goal statement. It looks very strange to newcomers, but I guess we'd all better get used to it.

Where to Use Which Language

With so many languages to choose from, how do you decide in which circumstances to use each? The answer is not simple, and beyond a certain level it becomes a matter of personal taste.

For general messing about, trying out ideas and knocking together 'quick and dirty' programs the best language is generally BASIC. The major reason for this is that it is so prevalent: every small micro offers BASIC as standard. A micro programmer who does not know BASIC is not much use, really.

You can do just about everything you are ever likely to need in BASIC, with the assistance of some assembly language patches for some things like getting at special I/O ports and the like. There are only two problems: there is a diminishing rate of return in using BASIC for large programs, and the resulting program is generally slow.

The second problem can be fixed, to some extent, by using a compiler on the finished program. But the first is more fundamental and is related to the lack of support for good structured programming practices within the language. Some versions, such as CBASIC Compiler and Waterloo BASIC, provide support for local variables, named subroutines and the like, but this begs the question: are such languages still BASIC or are they something new and different. Well written CBASIC code, for example, looks more like Pascal than BASIC.

If we use cars as an analogues for languages, then BASIC is the family sedan. It's inexpensive, there are lots of models in the range, and the most popular models have all kinds of (previously optional) extras fitted as standard.

Ah yes, Pascal. I only ever wrote one major package in Pascal, and would never use it again, if I could avoid it. Pascal is OK for academia, where all the people who will use a program are computer literate, familiar with the operating system, and so on. But in the real world, programs get used by office juniors, managing directors and other people who are completely fazed by error messages and who are liable to type the most improbable input string imaginable.

Real-world programs, therefore, have to have extensive input validation, and this turns out to be quite difficult in Pascal, as it is tediously uncooperative in type conversions. The Pascal programmer has thus to resort to all kinds of devious practices to get Pascal to work for him, which defeats the purpose of using the language in the first place. If you have to use sleight of hand so that the compiler can't understand what you're doing, there's a fair chance you can't understand it yourself.

The right place for Pascal, then, is where you have to write programs of moderate sophistication which you will only ever use yourself, and where you are sure that a straight solution will do. I know this is cruel and a bit over the top, but the best analogy I can find for Pascal is a trainer bicycle. You can't go far with it, you can't go fast with it, you can't carry much with it, it's not to everybody's taste, but at least you can't fall over while riding it.

FORTRAN lives on in the scientific world. First of all, most engineers and scientists cut their teeth on it, so that it is widely known. Secondly, it is one of the first languages every mainframe and mini manufacturer provides on their machines, so that programs originally written on larger computers can be transported to micros and vice versa. The fact that it is well standardised assists this. A third factor is the vast amount of FORTRAN software which has been published or placed in the public domain for all kinds of purposes.

The sheer inertia of the FORTRAN movement ensures that it will be around for a long time, and a FORTRAN compiler is a must for every micro in a scientific or engineering lab. However, that's not to say that FORTRAN is always the best tool for the job. Pascal can be appropriate here for simple jobs, or PL/I (with its superb double-precision arithmetic and hyperbolic functions) for the more complex ones. C is also an appropriate tool on occasions. However, those who own FORTRAN compilers should bear in mind that Real Programmers use FORTRAN.

FORTRAN is a bit like a Volkswagen Beetle; the design hasn't changed that much over the years, and people wouldn't like it so much if it had. It's a very rational and appropriate design, with few concessions to style or fashion.

COBOL finds its metier in the commercial world where programs have long lives and may be worked on by as many as ten programmers during that time. Consequently the requirement is not for flash tricky code, but for code that your average commercial programmer can pick up, understand and modify or fix. While the previous generation of COBOL compilers acquired a reputation for being as slow as a wet weekend, the latest releases are much improved, making COBOL a viable alternative for the commercial micro user or software house.

COBOL is like a half-ton truck: it serves the needs of commerce with little style. It can carry quite heavy loads, but will never turn heads as it passes.

PL/I - ah, now here's a language for Real Programmers. I turned to PL/I after my nasty experience with Pascal and have never regretted it. Block structures and structured programming statements like Pascal's, combined with sophisticated file handling and I/O and the ability to do binary arithmetic for scientific applications and decimal arithmetic for dollars and cents mean that here is a language that one can stick with. While the rest of the world follows fashions like Pascal and C, we PL/I programmers will be quietly getting on with the job.

If you're a COBOL programmer who wants to have some good structured programming support and less verbose code, then PL/I may be for you. If you're a FORTRAN programmer who wants higher precision arithmetic, better string handling and structured programming support, PL/I is it. If you're a Pascal programmer who feels the need for the occasional goto - used with discretion, of course - as well as sensible file handling, check out PL/I. If you're a C programmer, nothing I say is going to change your mind anyway; but PL/I has pointers, structures, unions, functions, storage classes and all the other things that make C such fun.

PL/I therefore emerges as a good all-round language with particular strengths for commercial software and scientific, though utilities compiled in it tend to be a bit large. It's a bit like these new family vans with seats that fold, turn and twist: you can use it as a sedan, as a minibus, as a delivery van or as a camper. Bear in mind that unlike those vans, it has a V12 under the bonnet.

C is a sports car in comparison. It's small, zippy and manoeuverable, very light in weight and when it crashes, you're a gonner (due to the lack of run-time debugging facilities, you see). C is best used to write systems utilities such as archivers, macro processors, editors, compilers and the like. It can be used to write commercial or scientific programs, though its arithmetic and file handling let it down in the former area. If you build on its lower-level functions to construct nice string-handling and I/O functions, what you wind up with is very like PL/I, which would have made a more appropriate starting point.

The best feature of C is its portability. C programs can generally be moved from system to system with the minimum of effort, and this means that software authors can be assured of achieving the maximum return for their effort.

Prolog is quite different from most languages. At the moment, it's still barely out of the research labs, and most users in Australia are in Universities. I dare say that Prolog could be put to commercial use, particularly in bibliographic databases, small 'expert' systems and the like.

In the car metaphor, Prolog is rather like those experimental designs that the manufacturers roll out every now and again, informing us that in ten years time, we'll all be driving cars like these, with four-wheel steering and the like.

Ada is an armoured personnel carrier. It's designed to be reliable, carry all kinds of loads, be fast and it has a radio built in so it can communicate with other carriers.

Ada supports a number of recent structured programming concepts directly. For example, the idea of separate compilation of modules has been around for a long time; many Pascal, C and PL/I compilers offer this facility. But Ada is the only language to date to support the separate compilation of modules (Ada calls them packages), as part of the language.

Ada actually addresses a number of problems which have emerged as recent issues in software engineering. For example, we now know that the cost of software is significantly higher than that of hardware. We know that the cost of fixing bugs increases rapidly the later they are discovered, and that the cost of maintaining a program is high in comparison with the original cost of writing it.

One key feature of software developed by or for the DoD is its complexity and size; we are talking here about projects at least in the tens, if not hundreds, of man-years in development. In this environment, it is essential that code once developed be reused, and that developers are able to create a library of routines with a well-defined interface.

Ada addresses this requirement with the concept of the package. A package comprises two parts: a specification part and an implementation part (users of UCSD and Apple Pascal will be familiar with this).

The specification part, which can be a separate file, consists purely of declarations, usually of the parameters which are passed to or from the procedures in the package.

The implementation part, or package body, contains the procedures or functions which work on the parameters named in the specification. It may also contain declarations, but only of local variables. Incidentally, the specification can also declare variables and types which are not accessible from outside the package, in what is called the private part of the specification.

Packages may be separately compiled and then linked together. The private part of the package specification, and the package body itself, hide information from the user. Thus, given a package called quicksort, the user has to take it on faith that this is a quicksort routine, and cannot tinker with the innards of the routine to convert it or otherwise abuse it. More to the point, it forces the original author of the quicksort package to make it as general as possible and to think hard about its interface with other routines.

The programmer can refer to variables in other packages by their names prefixed by the name of the package. Take a look at this code segment:

This is the package specification for some file handling code which can read and write sequential files. To actually use these routines in an Ada program, a programmer would write the following (by the way, Ada comments are written as '-- comment'):or some such. Notice that because the file access routines open, get, put and close are defined in the package fileio, they are prefixed by that name in this package. An alternative shorthand can be used:A common problem with languages which permit independent compilation of modules is that it is possible to modify one module without updating another, related, module. Subsequent recompilation will lead to disaster. We know, we've done it with Pascal, PL/I and other languages.

In a full Ada system - what is called the Ada Programming Support Environment - the compiler maintains a database of the packages and their relationships or logical dependencies. So that if one module - B - depends on another - A - and A is modified, the compiler will refuse to recompile B until A has been recompiled. Any other packages which depend upon B will also fail to recompile.

Not only this, but if one package is modified, the compiler will recompile and link only those packages which are dependent on the modified one, thus cutting down on compile time. This automates the functions that are performed by the UNIX source code control system, or on a smaller scale, my Compiler Manager program.

All of these features are designed to make programs written in Ada easy to write and to modify or maintain - especially the latter. They also help to ensure that Ada programs are bug-free - an important consideration when writing software that will, for example, guide cruise missiles.

Ada supports a facility known as overloading, which can be very useful when writing libraries of generalised routines. For example, suppose you have been given the task of writing a suite of mathematical functions for your company, such as square root and others.

Now on occasions you may want to take the square root of a floating point number, and on other occasions you may want to take the square root of an integer. In most languages, you would resolve this by writing two square root routines, called fsqrt and isqrt, and warning all programmers to use the right one.

Ada is able to distinguish between two identically named routines on the basis of the types, number and names of the parameters and the result type. Here's an example:

The compiler automatically sorts out just which function has to be called in a particular case.

Ada boasts a number of other unusual features. For example, parameters can be passed to a procedure by means of 'keyword notation', which looks like this:

open (filename => 'FRED.DAT' filemode => READ filestat =>mystat);

Here it is obvious which parameters are being passed, and the order is not important.

Ada also supports multi-tasking programs which consists of multiple threads of execution which have to be synchronised, through the task declaration. It is even possible for the programmer to specify in what format variables are stored (allowing communication with programs written in other languages) and even force storage in specific processor registers if required.

All this adds up to a language that is complex, but which is powerful enough to tackle a range of problems, making it well worth the learning.

Ada is about the only thing that could tempt me away from PL/I. . .

Survey of Commercial Products

BASIC

While we talk about BASIC as though it was one language, in actual fact BASIC is a range of languages. The trouble with the standard BASIC - known as ANSI Minimal Standard BASIC - is that it is just that: minimal. Every implementation has additional non-standard features, and we have to rely on the market to define de facto standards.

Microsoft BASIC

The best known BASIC, and the de facto standard, is Microsoft BASIC. It originally appeared in 1975 for the Altair computer, and was known as Altair BASIC. Although the original version was supplied on paper tape, an extended version and a disk-based version soon followed, and the language was then made available for other computers.

What really made it the standard was its adoption by Apple and Tandy as the standard BASIC built into their computers in read-memory. Since then it has been the standard BASIC built into virtually all new personal computers, including the IBM PC.

Microsoft BASIC extends the old standard with the addition of integer and double precision data types, extends the string handling, and in its disk version provides both sequential and random-access file handling. versions implemented for specific hardware, such as the IBM PC, augment this with graphics instructions (for line drawing, plotting and painting), communications and I/O port support and music playing commands. The result is a versatile general-purpose language.

The major drawback to BASIC is the fact that it is, in its original implementation, an interpreted language. This means that while programs can be written to do almost anything, they will often run too slowly for practical use. The solution to this is the Microsoft BASIC compiler (often known as BASCOM), which compiles the BASIC program down to machine code, with a consequent improvement in speed. The exact extent of the benefit depends on the nature of the code; programs that do a lot of floating-point arithmetic, for example, will not benefit that much, because the interpreter and compiler use the same floating point routines.

Microsoft's BASIC interpreter is supplied as standard with many machines, and is available in both eight and sixteen-bit versions. The compiler is not quite so general, and does not generally support many of the graphics and other extensions built into the interpreter. Despite this it is still one of the most popular development tools for machines such as the IBM PC.

Guessing Game in Microsoft BASIC (plain version).

CBASIC

Another popular BASIC for microcomputers is CBASIC. This is available for most processors, in two major forms. The original CBASIC was a development of BASIC-E, a public domain compiler written by Gordon Eubanks at the Naval Postgraduate School in Monterey. This compiler processes a modified form of BASIC (no line numbers are required, for example) and outputs an intermediate code file, which must then be interpreted by the run-time module, CRUN.

CBASIC achieved popularity for three major reasons: it was the only BASIC compiler available for some time; it uses decimal arithmetic, which eliminates rounding errors in financial calculations; and software developers could distribute runnable programs without releasing their source code, reducing the risks of piracy.

REM   Guessing Game Version 2.1
REM   Programmed in CBASIC-2
REM   3/3/83
REM   Execute an INPUT before RANDOMIZE
INPUT "Hi, there, what's your name?";NAME$
PRINT "OK ";NAME$;", do you want to play a guessing game";
REM PLAY$ controls the outer loop using WHILE - no GOTO
INPUT PLAY$
REM The RANDOMIZE statement uses your delay in answering \
the INPUT PLAY$ statement to seed the random number \
generator.
RANDOMIZE
REM We're only interested in the first char. of PLAY$ being \
"Y" or "y"
WHILE UCASE$(LEFT$(PLAY$,1)) = "Y"
NUMBER = INT(99*RND+1)
PRINT "I'm thinking of a number between 1 and 100"
PRINT "You've got to try to guess it."
INPUT "What's your guess?";GUESS
WHILE GUESS NE NUMBER
IF GUESS > NUMBER THEN PRINT "Too high"
IF GUESS < NUMBER THEN PRINT "Too low"
INPUT "What''s your guess?";GUESS
WEND
PRINT "You've got it, ";NAME$;"!!!"
INPUT "Play again?";PLAY$
WEND
END

Guessing game in CBASIC - fancier version.

CBASIC Compiler

Despite being a compiler, CBASIC 2 generally is slower than Microsoft's BASIC interpreter, and so Digital Research (who purchased Eubanks' company) has released a true native code compiler called the CBASIC Compiler. This accepts CBASIC code directly, although it adds a number of enhancements such as named functions, super-long strings, local variables and alpha labels, and compiles to a relocatable file which is then linked to produce a runnable machine code program. The CBASIC Compiler generates some of the quickest code around.

CBASIC is available for all the CP/M family of operating systems, including CP/M-68K, and the CBASIC Compiler is available for CP/M-80, CP/M-86 CP/M-68K and PC-DOS, making it a good choice for software developers (particularly those who have long written accounting systems in CBASIC). A version will be available by year-end for UNIX.

Microsoft now produces a compiler called the Business BASIC Compiler, which is rather like a cross between CBASIC and their own BASIC, and this may be a good compromise for those who see virtues in both schools of thought.

The latest Microsoft BASIC Compiler is QuickBASIC 4.0, which incorporates many extensions in the spirit of CBASIC.

Pascal

There is one thing the world is not short of, and that is Pascal compilers. Good Pascal compilers are a slightly different proposition, although most are adequate.

UCSD Pascal

The best-known is UCSD Pascal, which is hosted by the UCSD p-system operating system. This is best known as the Apple Pascal system, which is actually version 2 of the UCSD system. However, Softech Microsystems have now released Version IV of the p- system, and this is the current version found on the Sage, IBM PC and a number of other machines.

Although an ISO standard exists for Pascal, and the Pascal User Manual and Report lays down another 'standard', UCSD is really the de facto standard in the micro world. Although the UCSD compiler produces p-code (with the performance limitation that implies), the code is linkable with assembly language produced by a matching assembler. That UCSD Pascal can be used for system programming tasks is clearly demonstrated by the fact that almost the entire p-system is written in it.

MS-Pascal

For the PC-DOS/MS-DOS and OS/2 worlds, Microsoft produce MS- Pascal. This is a superset (as are virtually all Pascals) of the ISO Standard. This compiler can be operated (with inconvenience on a single-drive PC with 100K of user memory. It supports the optional 8087 numeric data processor and generates very efficient code.

Pascal/MT+

For CP/M-80, CP/M-86 and MS-DOS, Pascal/MT+ is another high- quality, robust product. It has a number of syntactical differences from MS-Pascal, and the generated code does not seem to be as well optimised, but it is a good choice for operation with the Digital Research companion products Display Manager, Access Manager and GSX (the GKS graphics driver). Again, 8087 support is provided.

Turbo Pascal Version 4.0

Turbo Pascal has some hidden gotchas - slow floating point, for example - but is in almost every respect a superb product. (From my perspective, the only thing wrong with is that it is a Pascal compiler). It compiles with amazing speed, produces fast, compact code and appears to be quite robust. Included in the package is a demonstration spreadsheet program, and separate packages of IBM PC utility routines - windows, clock access, database access, editing and the like - are also available.

Pascal Version of the Guessing Game.

Modula-2

Niklaus Wirth has himself abandoned Pascal as an incomplete language, and has developed a new language called Modula-2. This has a number of enhancements over Pascal.

Volition Systems Modula-2

The best-known company in the Modula-2 field is Volition Systems, who have versions of the language for the Apple II, Sage and IBM PC. Their latest version, for the XT, costs $US395, and includes both Modula-2 and Pascal compilers, module library, Advanced System Editor, p-shell (a UNIX-like programming environment) and utility programs.

To give you an idea of the difference between Pascal and Modula-2, here are a few features of this implementation: RAM disk support, 8087 support, subprogram overlays, random and sequential file access, directory operations, concurrency and interrupt handling, strings and BCD arithmetic.

PL/I

DRI PL/I-80

I find it rather uncomfortable, as a progressive 'Think micro/small is better' type of person, to be endorsing a 21-year old language that is famed for its hugeness. However, I find that PL/I gives all the structured programming tools available in the best languages without restricting me to them when I know what I want. In addition, it has features such as choice of fixed decimal or float binary arithmetic for commercial and scientific programming, together with a full range of functions such as SINH and others. Floating point is up to 53 bit precision in IEEE format.

The DRI compiler is a particularly high quality compiler for the ANSI minicomputer Subset G implementation of PL/I. It generates fast code which is linkable to assembler and other languages, and also is linkable to the DRI Display Manager and Access Manager packages. It is one of only a few languages to support features on newer CP/M implementations such as passwords, file sharing/locking, and date and time functions.

The package includes the RMAC assembler, linker, cross reference utility and librarian.

The documentation is particularly good, recognising the fact that most micro programmers will not have been previously exposed to PL/I, and includes comprehensive examples on disk, such as a network analysis program, chess game, text indexing program and others. This was the Infoworld software product of the year a few years ago, and one can see why.

DRI PL/I-86

is available for both CP/M-86 and PC-DOS, and is completely compatible with PL/I-80.

PL/I Version of the Guessing Game.

C

C compilers are available for just about every machine now, as C has achieved almost universal recognition as a system programming language. A computer that can't run a C compiler is destined to suffer a software drought, as much of the software now appearing on the market is written in C in order to achieve portability. dBASE III, for example, is written in C, and this has provided an opportunity to finally correct the bugs that plagued dBASE II.

Before starting to examine the compilers we need to establish just what we are looking for. In other words, just what makes a good C compiler?

We can identify a number of characteristics which distinguish the various C compilers on the market. These are:

1. Reliability

Professional programmers, in particular, need to be sure that when a bug emerges in their code, they put it there and not the compiler. To a professional programmer, time is money, and they cannot afford to waste time debugging the compiler when they should be debugging their own code. All other characteristics are irrelevant if this cannot be satisfied. Reliability is also important to the beginning programmer, who is discouraged by the appearance of bugs, while the hacker may be willing to trade off reliability for cost (hackers are notoriously short of money for software).

2 Support from third-party vendors (libraries, debuggers, etc.)

The emergence of suppliers of pre-compiled function libraries for PC compilers has encouraged programmers to sweeten up their products with the inclusion of windows, mouse-driven menus, communications functions and the like. If you want your programs to look good, you'll have to use some of these libraries. Other libraries support B+-Tree file access, screen forms and other functions essential to business software.

Compilers which are not supported by these third party products are obviously of less utility. Similarly, programmers may be dependent on sophisticated debugging tools. While programmers of the old school (like me) may view such new-fangled gizmos with scorn (we paid our dues, debugging the hard way, and don't see why these youngsters should have it easy!) there's no doubt that when you're in a hole, some of these split-screen, source and object simultaneously displayed, debuggers can really pay for themselves in minutes. The same applies to profilers, which tell you which parts of your code are executed most frequently and should therefore have the most optimisation attention paid to them.

3 Run-time performance

Speaking of which, a slow program is undoubtedly less appealing than a fast, responsive one. Compilers which generate slow code all other factors being equal are of less interest than those that generate fast code. Hence our interest in benchmark performance.

4 Small code size

Likewise, code size is important, not so much to the beginner, but definitely to the programmer working on large projects. It doesn't really matter whether a program comes out at 16 Kbytes or 12 Kbytes, but it certainly does make a difference whether it comes out at 640 Kbytes or 512 Kbytes! Incidentally, this is why Ashton-Tate had to switch C compilers between dBASE III and III Plus on the older compiler, III Plus was coming out at over 640 Kbytes, which just won't fit into the memory of the PC.

5 Compile speed

This is more important to beginners and hackers than to professional programmers. Beginners tend to have to recompile programs frequently as they discover bugs; the professional is more inclined to the view that the program is ultimately compiled once but run millions of times, and that therefore run-time performance is more important.

6 Memory models supported (for 8086/88)

The 8086/8088/80286 processor family uses what is termed a segmented architecture. This breaks memory up into 64 Kbyte chunks called segments, and there are various ways of combining segments into a program. The simplest, called 8080 model, simply places all code and data into a single segment; this limits the size of the program and its data.

More commonly used is small model, in which the code and data are separated into two segments. This allows larger programs at virtually no cost in terms of speed or complexity.

The next refinement is to allow multiple segments for either code or data or both. Now, to access widely separated data items, the processor's segment registers must be reloaded, which slows the machine, and of course pointers must now be 32-bit values rather than 16 bits in size, using up more memory.

Different compilers have different ways of coping with this situation, but of course, it is important for large programs that it be dealt with somehow.

7 Conformance to standard (K&R or ANSI) / portability

One of the major benefits of the C language is its portability the ability to move programs between machines, operating systems and compilers. As soon as you start to use non- standard features of a compiler, you are severely restricting the portability of your code. Admittedly, you can make use of the C preprocessor to assist, but in general programmers find it harder to switch between dialects of the same language than to switch between totally different languages.

Adherence to a standard ensures that code written to those standards will compile correctly first time, that programming skills can be transferred, that function libraries will port correctly and that code you write will have a long life on different systems.

8 Comprehensive, usable documentation

This almost goes without saying. The more documentation you have, the better. It should be well organised, indexed, with sections describing the compiler options, associated utilities such as librarians, and the function library. Non-standard functions, in particular, should be thoroughly documented with examples, and something I would like to see is a definition of exactly in what way non-standard functions are non-standard and what the UNIX equivalents are.

In fact, a programmer's needs vary as he progresses through different stages of using a language. The beginner needs a compiler that is fast, simple, unsurprising and is generally less concerned about speed and compactness of generated code. He needs good quality documentation, ideally with a tutorial orientation, and the ability to compile at least a good selection of introductory tutorial programs.

The hacker needs a compiler that is ideally fast, generates fast and compact code, isn't too laden with options, and has the ability to compile standard code as well as a wide selection of public domain code.

The professional programmer needs a compiler that is above all reliable, has a wide range of options to generate code for different memory models, can link to assembler, has strong support from suppliers of libraries and support tools, generates tight code and is well documented. He is generally willing to pay for it.

The proposed ANSI standard X3J11 for the C programming language incorporates a number of advances and new features. Unlike some earlier language standards (e.g. ISO Pascal) this one adds features and does not constitute a subset of all practical compilers, and in practice many compilers are starting to support the ANSI standard extensions such as function prototyping.

Methodology

An evaluation of several C compilers is an ambitious undertaking. There is first the question of ascertaining what constitutes a good C compiler. Then there is the design of appropriate benchmark tests and investigative techniques for comparing the compilers. Then, of course, any of the compilers may be negatively affected by poor performance in an area not specifically being investigated.

BDS C

Probably the best-known C in the eight-bit world is BDS C, written by Leor Zolman. BDS is not a 'full' C - it lacks floating point, initialisation of variables and has non-standard syntax on some functions - it provides the flavour of C sufficiently to have been the starting point for many of today's most successful C programmers.

Available only for CP/M-80, BDS C is inexpensive enough to make it worthwhile acquiring a copy just to play with. Its speedy and simple compilation is designed for exactly that purpose, although it is worthy of professional software development. In addition, a large number of public domain software utilities are written in BDS C, and the compiler will be necessary to modify these programs.

BDS C Version of the Guessing Game

Lattice C

Latttice C was for a long time the dominant C in the MS-DOS/PC DOS world. It generates fast code, is stable, runs on a variety of hardware types (Commodore Amiga to Sun/Apollo workstations) and is well supported by producers of function libraries. Its pole position in the DOS world has now been taken by Microsoft C.

Computer Innovations C 86

CI-C86 comes with ten libraries for the PC. Two of these are small and big-model libraries of IBM PC-specific functions, while the remaining eight libraries offer combinations of big / small model, pre-DOS 2.0 / DOS 2.0 and later, and software / 8087 floating point code.

The IBM PC specific functions are a handy bonus, including routines for access to the RS-232 ports, the printer ports, keyboard and screen handling routines, including graphics.

This is not a small compiler, particularly with ten libraries. Consequently, the compiler is supplied as a collection of squeezed files on five floppy disks, making installation just a little bit more complex than simply copying the files onto a hard disk. Fortunately, a batch file simplifies the process.

Generated code is reliable and bug-free; whenever I've had problems, I've found the solution to lie in my code and not the compiler.

Bit fields are supported, but not the void type, nor structure assignment, nor enumerated types. These are not currently major deficiencies, since few programs make use of them. This can be expected to change with the increased adoption of ANSI features in other compilers.

Performance is quite acceptable; the new version 3.0 compiler (due for release real soon now) is said to be significantly faster.

The documentation consists of a single manual which covers the various programs which make up the system: the four passes of the compiler, the librarian and archivist utilities, plus the different libraries for the compiler.

All functions are extensively documented with example programs, and this is further reinforced with some applications notes which illustrate topics like programming the serial ports on the PC, accessing BIOS routines and driving ANSI.SYS.

I purchased two copies of this compiler directly from CI in New Jersey, and found them to be very friendly and helpful to deal with, with good support (not that we've needed it).

Incidentally, users of this compiler should be aware of a book called Systems Software Tools by Ted J. Biggerstaff, published by Prentice-Hall. This book, which teaches the principles and practice of systems software design, contains source code for a terminal emulator program and a windowing, multitasking operating environment for the IBM PC. A disk containing the source code will be available shortly - and of course, it's all for CI C86.

Microsoft C

MS C 5.1 is the dominant C compiler used by professional developers. It supports a full range of code models, will generate code for both DOS and OS/2, has exceptional optimisation facilities, and conforms closely to the proposed ANSI standard for C (X3J11). One of the best features of MSC is its matching Codeview debugger, which is an excellent source-level insect eradicator.

The documentation is very comprehensive; three folders contain four volumes: a User's Guide, Microsoft Codeview Manual, C Language Reference Manual and a Run-Time Library Reference Manual. The documentation is well organised and complete.

This compiler boasts an unusually extensive function library which closely conforms to the UNIX/Xenix 'standard' as well as closely conforming to the proposed ANSI standard for the C programming language. In fact, the MS-DOS compiler shares a common run-time library with the Xenix C compiler.

The Microsoft compiler supports more memory models than any of the others reviewed: small, compact, medium, large and huge. These are defined as follows:

small model: one (up to) 64 Kbyte segment for code and one segment for data

compact model: one segment for code and multiple data segments

medium model: single data segment and multiple code segments

large model: multiple code and multiple data segments

huge model: as for large, but with no restrictions on array sizes

These models are simply default ways of declaring different kinds of pointers. Microsoft C supports three kinds of pointers: near pointers, far pointers and huge pointers. Near pointers are simply sixteen-bit values, often interchangeable with integers (though this is obviously implementation-dependent), while far pointers are 32-bit entities which comprise a segment value and an offset value, both sixteen bits each. Address arithmetic (and this applies to most compilers) is performed on the offset value only, on the assumption that comparisons and subtractions are only performed on pointers to related objects in the same data segment. While this is adequate for most purposes, it does impose a limitation on array sizes.

The huge pointer type is the same size as a far pointer, but pointer arithmetic on huge pointers is carried out on all thirty- two bits, thus allowing data items which are referenced by huge pointers to span more than one segment (though there are some restrictions in the interests of efficiency).

Apart from extensions such as these, the Microsoft C compiler adheres very closely to the proposed ANSI standard for the language. It supports such advanced features as structure assignment, the void type, enumerated types, and function prototyping in forward reference declarations of functions. While some facilities are not implemented, such as the const and volatile storage classes, the corresponding keywords are reserved. In addition, the manual describes the size of an int as 16 bits on the 8086/80286, but 32 bits on the 80386 and 68000, indicating the likelihood of future versions of the compiler for those processors.

As you might expect, the Microsoft C compiler provides comprehensive support for DOS functions for example, network file sharing with record locking, subdirectory manipulation and access to environment variables. Support for PC hardware is minimal; presumably third party vendors will take the opportunity to supply libraries for this purpose.

8087 support is comprehensive: the compiler comes with a library which emulates the 8087, and can be linked with this to sense the presence of an 8087 and either use it or cal

The compiler features two control programs: MSC.EXE, which uses the same syntax as other Microsoft compilers, and CL.EXE, which is compatible with the compiler controller for the Xenix version of this compiler. The major difference between the two is that CL can automatically invoke the linker.

Three utilities are provided to further customise systems using code generated by the compiler. EXEPACK.EXE compresses sequences of identical characters and optimises the relocation table of .EXE files, making them smaller and faster to load. EXEMOD.EXE allows the programmer to edit the contents of the .EXE file header, to allocate more stack space, change the allocation values, etc. Finally SETENV.EXE allows the user to increase the size of the environment area allocated by COMMAND.COM in DOS 3.1 and earlier.

One of the most attractive features of the Microsoft C compiler is the CodeView debugger; this is an interactive source-level debugger which uses windows to display the source code, command interaction and trace values. When used with the Microsoft Mouse, the CodeView debugger is particularly convenient and powerful. Multiple breakpoints can be set with just a mouse click, and variables can be watched. The display can show a disassembled listing with interleaved source code if required, and the processor registers can be displayed. Intelligent use of the Colour Graphics Adapter allows debugger display in one window while graphics output is placed on another.

All in all, the Microsoft C compiler is a very impressive product, and a good choice for the professional programmer working with the IBM PC or other MS-DOS systems.

Quick C

Bundled in the MSC 5.1 package, but also available separately, is the MS Quick C compiler. With a built-in editor and debugger and high-speed compilation directly to memory, this is an excellent low-cost compiler for those who just need a compiler for occasional use.

Turbo C

Borland have produced a companion to Turbo Pascal in this compiler. Excellent value for money, well supported by third parties; earlier problems with poor documentation have now been fixed up. Excellent debugger. A new version for Windows eliminates the need for the Windows Software Development Kit.

c-systems C

A company called c-systems produces a C which can use the full 1 Mbyte address space (for code or data) of MS-DOS machines and which supports all the DOS 2.0 and 1.1 function calls. Perhaps its best feature, however, is c-window, a dynamic debugger that displays source code, allows display and alteration of variables and lets the user set multiple breakpoints. This makes it a good choice for the beginning C programmer. It's $US199.00

Instant C

What appears to be a combination interpreter and compiler for C is available as Instant C, from Rational Systems Inc. This optimising interpreter eliminates compile times and permits symbolic interactive debugging, and is claimed to execute up to forty times faster than interpreted BASIC. A full-screen editor is integrated with the compiler so that syntax errors leave the cursor at the trouble area.

The compiler directly generates .EXE or .CMD files, suggesting that by bypassing the linking stage it gives up one of the major advantages of C - construction of libraries of debugged, reusable linkable modules. It's an interesting idea, though at $US500 it's not cheap.

Telecon C

For those involved in development work for a range of target machines, Telecon C could be the answer. Versions are available for 6809 (Uniflex), 8080 (CP/M), 8086/88 (PC-DOS, CP/M-86) and PDP-11 (RT-11, RSX-11) machines, either running on or generating code for each of these. C and assembly language may be intermixed, generated code is reentrant (allowing multi- tasking operation) and there are no royalties on generated code. Cross-compiler versions are $US500, resident ones are $US200 without floating point or $US350 with.

Ron Cain's Small C compiler, which was originally published in Dr. Dobbs' Journal, has spun off a number of descendants.

Small C 2.1

C users also ought to know about the C Users' Group, which started life as the BDS C Users' Group but has now branched out. The Group maintains a library of public domain software which is well worth knowing about.

DeSmet C

Here's a compiler to conjure with: high performance in a well thought-out package at a low cost. DeSmet C is, to the 16-bit CP/M-86 and MS-DOS world, what BDS C was (is?) to the 8-bit CP/M-80 user. That is, if offers adequately high performance, relaibility and full functionality at a price attractive to hackers, and is good enough for a lot of quick and dirty and small systems work.

DeSmet C, from CWare, is supplied on four disks which contain a comprehensive collection of software and utilities. Apart from the compiler itself, which operates in two passes (parser and code generator), there is also an assembler (which doubles as pass three of the compiler), linker (called BIND), libraries and various header files. So far, all as expected - but the fun has barely begun!

Also supplied on the disks are a full-screen editor, librarian, symbolic source code debugger, profiler, VDISK program, source code comparator program and various utilities and example programs. These are no lightweight utilities, either, as a brief experiment will show.

The compiler is a fairly straightforward affair, lacking the support for ANSI features which the more expensive compilers boast. Such features include structure assignment, enumerated types and bit fields. I have mixed feelings about this: if this older dialect of the C language was good enough to write the UNIX operating system in, then it ought to be good enough for most of us. Certainly, I've managed without such 'fancy' additions for a long time now - but I suspect my code might look better with their use.

The DeSmet compiler supports only small memory model - that is, one code segment and one data segment. This is adequate for many applications. For applications which require more code, DeSmet provides one work-around in the form of an overlay manager which will allow the programmer to place multiple overlays into the .EXE file and load them as required. This is good for small memory systems also (remember them?). Alternatively the overlays can be permanently loaded into memory for best speed.

The output of the compiler is a .O file in a proprietary object format. Actually, the output of the compiler is really assembler which is passed to the assembler directly and it is the assembler which produces the .O file. This means that it is possible for the programmer to insert assembly source in the C program source file, preceded by the #asm 'preprocessor directive'. The assembler is fairly plain vanilla, with no support for macros, codemacros or any of the other goodies to be found in MASM. On the other hand, the only use for this assembler is to write very short routines to perform functions like accessing the IBM PC ROM BIOS and the like.

The compiler provides good support for DOS functions like chain(), exec() and others, and also has functions for IBM PC screen and keyboard access.

The editor, see, is quite powerful and easy to customise for non-IBM hardware, since the source code for the screen driver component is supplied in source code form and can be bound to the object version of see provided. Other utilities (like the debugger) use the same screen driver. See has quite adequate facilities, including the ability to create macros, auto-indent and even automatic auto-indent following a { character.

Two libraries are provided, one with software floating point support and one with 8087 code. It was not clear from my (admittedly brief) reading of the documentation whether the 8087 code is in-line or uses subroutine calls.

The debugger is a source-code type which operates in the same way as see, with a Lotus-style menu across the top of the screen. The compiler and linker can include symbol-table information in the generated code, which is picked up by the D88 debugger to allow display of source code, values of variables and expressions, including arrays and structures. Screen flipping (from debugger to program output) is supported on the I

The profiler is quite a neat idea; triggered off the system clock, it simply collects the current value of the program counter and increments the count in one of 1024 buckets which produce the performance histogram.

Documentation is on the flimsy side, but is nonetheless fairly complete, extending even to a description of the 8086 and 8087 instruction sets.

The benchmark timings reveal that the DeSmet compiler is no slouch, being just behind the Microsoft compiler in performance, at a considerably lower price.

A particularly attractive 'feature' of the DeSmet package is the way that it has started to attract public domain support. A brief look through the libraries of the C Users Group, for example, reveals that the majority of recently contributed programs were written using this compiler.

Eco-C C Compiler

Jack Purdum is best known for his books on C programming for QUe Corporation, but he is also active as a software developer with his company EcoSoft. A strong and vocal believer in the future of the C programming language, Purdum has set about developing an up-to-date, powerful compiler at a low price. The result is Eco-C

MIX C Compiler V 2.0.2

The MIX C compiler is conspicuous in this crowd of compilers because of its low price at $79.95 (?) it is the cheapest by a considerable margin. Accompanied by a large (roughly 450 page) tutorial manual, the compiler is clearly aimed at the beginning C programmer.

MIX C is available for both the CP/M-80 and MS-DOS operating systems, and despite the constraints placed upon its designers in running on both those operating systems, it is a full implementation of the C language as described in K & R. In fact, it is a superset: it supports enumerated types, structure assignment, and other advanced features which K & R warned were coming but are only now finding their way into the ANSI standard for C.

The compiler comprises a single program, apparently single pass, though it's hard to be sure, with a matching linker. The compiler produces a source code listing on the screen as it works, which slows it down a little, as the benchmark timings show. However, this is useful for the beginning programmer as any errors are displayed in context with error numbers and an arrow to the offending character in the erroneous statement. Unfortunately, explanations of the error numbers are not printed until the end of the listing, but many will be obvious in any case.

The output of the compiler is a .MIX file which contains relocatable object code. This is passed to the linker, which links it with the standard library to generate a .COM file (this is unusual; most compilers use the Microsoft linker and produce.EXE files). This does mean that programs are restricted to 64K of combined code and data, but this is unlikely to affect beginning programmers.

The resulting code is far from blindingly fast, as the benchmarks table shows, although compile times are competitive. However, an optional optimisation program can be passed over the object .MIX file to produce worthwhile improvements in performance at minimal cost in file size (see table). The sieve benchmark, for example, improves from 2008 seconds to 164 seconds! I don't know what SPEEDUP.COM is doing, but it must be doing something right!

A similar optimiser called SHRINK.COM will provide space improvements. Alternatively, the run-time code can be omitted from the generated .COM file to make it even smaller; however, it must then be loaded at run-time from the RUNTIME.OVY file.

The package provides an unusual way of managing the standard function library; rather than providing a librarian, a program is used to convert the library into an ASCII file which can then be edited and appended to. It can then be re-converted back into its binary form. Ingenious, not too much trouble, and quite appropriate.

The documentation is nothing if not comprehensive. At 450 pages, it is a lot more than one expects with an inexpensive compiler. An initial section introduces the user to the compiler and linker through a sample program, and this is followed by the biggest section of the book, a tutorial on C programming. Then comes the reference section, which is not all that well organised for reference (it seems to follow a format similar to the tutorial) and finally a reference section on the compiler and linker options and utilities.

At the price, this is a class act, and compares favourably with compilers at five to ten times the price. However, I would have to suggest that this compiler is not suitable for full-scale production work; it does not generate as fast code as the leaders, nor is it as well supported with function libraries and other add-ons, nor do the compiler and linker have as many options and features (big and other model support, 80286 support, etc.).

Hi-Tech C Compiler Version 3.02

It is always nice to be able to point to local products which are on a competitive level to the internationally-market US products which dominate our market. Over the years, a number of products have shown that Australian software is up to world standards: the Zardax word processor, Typequick typing tutor, and other products have all succeeded in world markets.

In the Hi-Tech C Compiler, Australian Clyde Smith-Stubbs has done it again. This compiler has been around for a couple of years now, with versions for CP/M-80, CP/M-86 and MS-DOS/PC DOS. For Australian users, the advantages of local support direct from the author are obvious.

The compiler is supplied on two disks, with an installation program which places the files in a subdirectory with the compiler manager program in the root directory of the fixed disk. Operation on a floppy-only system is possible and doesn't seem to painful; the only problem we found was that the installation program did not edit our AUTOEXEC.BAT file correctly. No matter, I never trust these programs anyway.

The compiler is actually implemented as four passes, plus a separate link stage, followed by an .EXE file generation stage. The compiler generates a proprietary object code format which cannot be used with the DOS linker. Assembly language output can optionally be generated.

The first pass of the compiler, the preprocessor, performs some fairly strict type checking, rather after the fashion of a Pascal compiler (or the UNIX lint program). This considerably improves the portability of code, both from this compiler to another and from others - if you are likely to have problems with type conversions this compiler will warn you.

Two memory models are supported, through the use of two code generators and two sets of libraries. The small model supports up to 64K of code and 64K of data, while the large model supports up to 1 Mbyte of mixed code and data.

The dialect of C supported by the compiler is very standard with some recent extensions, such as structure assignment and the void type. Enumerated types are supported, with some restrictions appropriate to the type; this assists in writing clear code.

Register variables are supported (though it is not clear in the documentation just how many can be used, except on the Z-80 where the limit is one), but bit fields are not supported (a strange omission).

An optimisation pass will work for either speed or space optimisation. 8087 support is provided, through the use of calls to a library; a software f.p. library is also supported. The default action is to generate 8087 code; only if the -n flag is given on the command line does the compiler load the software f.p. library. This caused us some problems at first, as our benchmarks refused to run; we feel the default action of the compiler should be to generate plain vanilla code which will at least run.

Support for operation under DOS is good, with functions provided for subdirectory and environment access as well as process spawning and the more unusual dup(fd) function which forces a duplicate file descriptor.

The documentation is terse but adequate at 106 pages of laser-printer output (at a guess). However, important information is often buried in throw-away lines, such as the point that the -n flag is required on non-8087 systems (above). We'd suggest reading the manual carefully with a fluorescent ink marker in hand.

The compiler comes with a companion relocating macro assembler which accepts a variant of standard Intel ASM-86 mnemonics. It has some unusual features, such as the ability to use temporary labels for local references and to assemble conditional long branches (which assemble to a short branch if possible, otherwise to a short branch of the opposite sense around a long unconditional jump).

As with all categories of software, it is impossible to pick out any one of these products as the 'best'. Each of them has its distinctive features and advantages in difference circumstances.

For the beginning C programmer, we'd suggest the MIX compiler because a) it is inexpensive (you might not stick with C in the long run) and b) the documentation has an extensive tutorial organisation. The restrictions of this compiler, particularly the lack of large model support, will not matter to the neophyte, while its ease of use and diagnostic listing will assist considerably.

For the occasional user in a support job, or the hobbyist/hacker, it's hard to go past the Quick C compiler. It's fast, complete, and well supported, and a bargain at the price.

At the top level, it's a choice between Microsoft C and Computer Innovations Optimising C86. Microsoft C is a higher performance product, and the inclusion of the Codeview debugger, together with the luxurious documentation, pretty well clinches it. On the other hand, CI C86 has a pretty useful library of functions for access to PC hardware which is attractive to those writing printer drivers, comms and similar utilities for the IBM PC. The best course would probably be to buy both (?).

Aztec C

Update (23/03/09): Bill Buckels has written to point out Aztec C, a family of compilers and cross-compilers for MS-DOS, Apple DOS, the Commodore 64, Amiga and Macintosh produced by Manx Software Systems. Bill has some older versions of the Aztec compilers, along with sample code, available on his website at http://www.aztecmuseum.ca/ .

FORTRAN

FORTRAN-80

The Microsoft FORTRAN compiler got off to a shaky start with some serious bugs and even today has some serious bugs. This is a full FORTRAN 66 with only one omission, the COMPLEX data type. Despite the bugs, it is currently the only game in town.

MS-FORTRAN

is a subset of FORTRAN-77 for MS-DOS which supports double precision and integer arithmetic using the 8087 numeric data processor chip. According to irate letters in Byte, this one suffered from some serious bugs; however Version 4.1 is considered to be stable.

Ryan-McFarland FORTRAN 77

IBM have for some time sold a version of this compiler as their Engineering/Scientific FORTRAN; it is mature, reliable and generates tight code. Particularly suitable for engineering work (data aquisition etc.) and graphics.

Lahey F77L

A low-cost FORTRAN 77 from a small company with an excellent reputation for supporting their product. Probably the best value for money in the FORTRAN market.

FORTRAN IV version of the guessing game.

COBOL

Some years ago, in the euphoria of the micro revolution, I commented that COBOL was a software dinosaur which didn't yet know it was extinct. It's still with us; there are more COBOL compilers than ever before for micros, so it just won't go away. The reasons are undoubtedly in the economics of software production; it is cheaper to produce software in COBOL than later fancier languages.

(Aside for joke: Fellow rings the producers of "That's Incredible" and says "You know how they say 100 monkeys with 100 typewriters, given 100 years will write the Complete Works of Shakespeare? Well, I've only got one monkey here, and in ten minutes he's already written a COBOL program . . .")

Microsoft COBOL-80

is an ANS Level 1 COBOL 74 with extensions which include much of Level 2. It requires CP/M or MS-DOS and has full sequential, relative and indexed file support with variable file names. It also has powerful screen formatting with ACCEPT and DISPLAY verbs. Although the compile/link process produces .COM files, these are actually p-code which is interpreted so that it is not as fast as is theoretically possible.

Programs are segmentable so that they can be larger than the available memory, and a CHAIN command allows programs to transfer control with parameter passing.

The standard package comes with the MACRO-80 assembler, linker and librarian, and can be used in conjunction with the M/SORT package to provide the Level 2 SORT verb.

MicroFocus Cis COBOL

Is an ANS Level 1 COBOL with extensions such as hex literals, lower case characters, and runtime input of filenames. It runs in 48 Kbytes of RAM under CP/M or the BOS operating system with two disk drives.

The package features interactive CRT handling, advanced screen formatting and data entry, line sequential files, FORMS utility, interactive debugging and linkable subroutines.

Level II COBOL

Available from Digital Research and others, this is actually MicroFocus COBOL. It is the only Level 2 COBOL available and meets GSA specifications. It includes full Level 2 SORT/MERGE file description including RELEASE and RETURN. Generates pseudo- code (sorry).

COBOL 85

Another from the MicroFocus stable, this was the first (and probably still the only) COBOL compiler validated by the US GSA (General Services Administration). Top of the line, with interactive source-level debugger and native code generator.

Nevada/Utah COBOL

Unlike other COBOLs, this one is a true native code compiler and is very fast, particularly on file access which seems to be highly optimised. Not a full COBOL by any stretch of the imagination, but a usable subset and excellent value.

RM/COBOL

holds the promise of software portability. It appears to be written in C (I could be wrong) and runs on TI and NCR minis, as well as under CP/M, MP/M, TRSDOS, OASIS and UNIX. It has established a very good reputation very quickly. A matching code generator for data input, file maintenance and report printing programs is also available.

MS-COBOL

The latest version of Microsoft's flabby compiler.

Realia COBOL

A very fast COBOL compiler, almost to full Level II standard, from a small company that does nothing else and really understands COBOL. Generates very fast code.

FORTH

fig-FORTH

MasterFORTH

LOGO

LOGO is an educational language developed by Seymour Papert at MIT, and is best known for the 'Turtle Graphics' which are closely associated with it. For the most part, the bulk of the LOGO interpreters available are just that - vehicles for turtle graphics - and no more.

Dr LOGO

This is the only Logo we have in use at the moment. It is a comprehensive programming language, and this implementation has a full complement of functions, making it suitable for general programming and even simple expert systems work.

Apple LOGO

Terrapin LOGO

IBM LOGO

TI LOGO

Ada

Janus/Ada

There are currently no full Ada compilers available for micros, and this package is the closest I have seen. Available for CP/M-80, CP/M-86 and MS-DOS, this is a fairly complete implementation of Ada, including packages, all data structures, and most other features. It certainly gives a feeling for the concepts of Ada, which is not just a souped-up Pascal.

Janus/Ada, from RR Software, gets its name from the fact that all Ada compilers must pass the official Ada Compiler Validation Suite. There are no subsets of Ada, and no extensions to the language. This assures full portability of Ada programs.

Janus is a subset of Ada, therefore it is not Ada. Despite this, it is a very full subset, which enables programmers to get a feel for the full language. Janus supports most Ada data types (integers, reals (with 8087), characters, enumerated types, booleans, one-dimensional arrays and variant records) plus a few of its own: bytes and strings. Like Pascal, Ada allows the programmer to define his own types.

Janus also supports the separate compilation features of Ada, so that packages can be separately compiled and linked. Programmers who have worked on large projects (or what seemed large) will appreciate this capability.

Janus does not support several of the more complex Ada capabilities, such as tasking, exception handling and representation specifications (which describe how objects are stored). There are also some variations in the Janus pragmas (compiler switches) and other minor deviations from the Ada standard, but these are not significant for most purposes. In any case, Janus has a pragma which will generate warning messages for any non-standard usages.

The Janus compiler is a large system (almost 400 Kbytes for the compiler alone). It's a multi-pass compiler, four passes plus the separate linker, but compiles around as fast as the PL/I compiler which is the closest thing we have to it.

We found no bugs in the compiler or the code it generated, although we weren't really pushing it to the limits, so to speak. Since we were running under MP/M-86, we had to set some of the compatibility attributes on the compiler, suggesting that its file handling may be done in a non-standard way (at least as far as MP/M is concerned).

One thing the compiler does that can catch the unwary is that it reads the symbol tables files (.SYM) from previous compilations. If, as I was doing, you compile a program called SIEVE.PLI using the PL/I compiler, then run the Janus compiler on SIEVE.PKG, the compiler will make no sense of the .SYM file the PL/I compiler has left behind, and will produce an error message. The answer is to erase any .SYM files before compiling.

The compiler's performance is pretty good, all things considered. We first of all ran a simple looping benchmark and compared it against the DRI PL/I compiler, which is a pretty good performer. The results were as follows:

Janus/Ada DRI PL/I-86

39.90 s 12.47 s

This means that, doing simple binary arithmetic, the PL/I code was over three times faster. But! - there is a catch. Like Pascal, Ada normally inserts range and overflow checking, to produce error messages rather than simply crashing. Which of course, takes time. By using a pragma to disable range checking, we were able to speed the program up:

Janus/Ada DRI PL/I-86

18.04 s 12.47 s

The situation is improving.

Janus can optionally be supplied with a code optimiser which examines the generated code, removes code that is never executed and generally tightens up the code to improve performance. The result is a program that is smaller (so it loads faster) and executes faster. The results of using the optimiser are shown below.

Straight compile Rangecheck(off) Optimiser Both

39.90 s 18.04 s 34.22 s 12.47 s

You can see that with optimisation and with range checking disabled, Janus/Ada provides the same performance as the PL/I compiler. The code generated by the Ada compiler is smaller than that for the PL/I compiler, reducing load time, so the actual processor performance of the Janus code is probably very slightly less than that of PL/I.

One area where the PL/I code spends a lot of its time is in output formatting, since it right justifies numbers, while the Ada standard, like Pascal, is for left justification.

All of this shows that while, at first glance, the Janus compiler generates adequate, if slightly slow, code, once you investigate the options it can be speeded up dramatically with no changes to the code. It ain't what you've got, it's how you use it.

We also tried the Janus and DRI PL/I compilers on the Sieve of Eratosthenes benchmark, with the results shown below:

Rangecheck(off) Rangecheck(off) and Optimised PL/I-86

16.06 13.34 9.97

Again, the code generated by the PL/I compiler was more than twice as big.

We did encounter one small bug in the optimiser - it generated a Divide By Zero Encountered error while optimising a very small program.

Additional features

As well as the optimiser, there are a couple of other options for Janus/Ada. One of the most immediately useful is a Pascal to Ada translator called Pastran. We tried this on a couple of small programs, with great success, although we didn't have any large Pascal programs around the office to give it a real test.

There is also a profiler utility which can be used to locate hot spots in programs and find out where the processor is spending most of its time. Use of a profiler program is mandatory before trying to improve the 'efficiency' of programs, as often your efforts to improve things make only very little difference.

Documentation

The Janus/Ada package comes with a quite comprehensive manual for the compiler and the other options, but it is in no way suitable for learning the Ada language, and a separate textbook will be required for this purpose. Fortunately, there are plenty of good textbooks around.

The Janus/Ada manual is well organised, and I found the information I needed in it without too much trouble. One small objection is that the cover is too floppy to hold it upright in a bookshelf - but many users will just leave it open on their desks anyway.

The Janus/Ada disks come with a variety of additional libraries for additional functions such as colour graphics or accessing a real time clock (necessary for the profiler). For example, the NEC APC version has been modified to access that machine's hardware features.

Ada is, as many of its proponents rightly point out, the language of the nineties. Whether it will be able to knock C off its favoured position is open to debate, but it certainly will provide many programmers with their first taste of a really advanced software tool. And of course, since it is sponsored by DoD (those wonderful people who brought you COBOL) many programmers will simply have to work in it, whether they want to or not.

Janus/Ada is a very useful implementation of a subset of this important language. It is obviously of high quality and is useable for creation of finished products now. Anybody interested in getting a grasp on Ada is strongly recommended to investigate Janus/Ada.

The guessing game written in Janus/Ada

Prolog

Borland Turbo Prolog

Variations on the original Prolog have been developed at several universities, with the most popular dialect being the one from Edinburgh University's Department of Machine Intelligence. Turbo Prolog is essentially an extension of this dialect, although it differs from most Prologs in being a compiler and not an interpreter. This has a number of consequences, and as a result Turbo Prolog programs look rather different from conventional Prolog programs.

In particular, Turbo Prolog is a strongly typed language, after Pascal, in which variables must be declared before use. A typical Turbo Prolog program will have at least three sections: domains, predicates and clauses.

The domains section corresponds roughly to the Pascal type section, while the predicates section effectively declares the functors which will be used later in the program. The clauses section is the meat of the program, and constitutes the 'executable code' of the program.

Let's look at an example. A few things to remember: in Prolog, words that begin with a capital letter are variables, while words with lower case initials are symbols (constants) or predicates. Predicates are statements, and are either facts or rules. This program contains both:

This program answers questions about people's favourite activities. The database contains one predicate, likes, and a clause such asshould be read as 'ellen likes tennis'. The last clause is a rule:

bill likes something if tom likes it too

These statements constitute a simple Prolog program. To run it, and query Prolog's internal database, we simply ask it a question:

In this statement, X is an uninstantiated variable, that is, a variable which does not currently have a value. In order to provide a value for X, the Prolog system searches through the available clauses for the predicate likes until it finds a clause in which bill is the first variable.

It now knows that bill likes something if tom likes it, so it institutes a second search through the database, this time looking for clauses about tom. If it finds one (and it does: likes(tom,baseball)) it takes the activity variable from that clause, plugs it into the clause about bill liking what tom likes and then plugs the result of that back into X.

Prolog will now print 'X = baseball', and then proceeds to continue its search through the database for other things which tom likes, because bill will like them too. It won't find any, but if it did, it would print them too. Having exhausted the clauses for tom, it returns to searching for clauses for bill, but again, it finds nothing else. It finally prints '1 Solution' and waits for another query.

That's it. Notice that Prolog will search repeatedly through its database of facts and rules to find the information it needs to satisfy a request. Contrast this with the conventional top-to- bottom flow of control in conventional programming languages and you'll begin to see why Prolog is considered non-procedural.

Turbo Prolog is based on Edinburgh Prolog, but experienced users will find a number of inconsistencies: the use of = is slightly nonstandard; there is no \= operator; the 'is' infix predicate is not supported, and no infix operators can be defined - one must always use functor notation.

On the other hand, Turbo Prolog contains a number of extensions which make life interesting: simple turtle-style graphics, sound, windowing, Pascal-like I/O statements, a built- in editor, access to I/O ports (useful for robotics applications), random access files, real arithmetic, trig functions and bitwise operators (again handy for robotics).

The documentation is reasonable, with separate tutorial and reference sections, but first-time users will need a copy of Clocksin and Mellish for background information.

Borland has done more to keep Pascal alive than any other company, and we can expect Turbo Prolog to do more to launch the widespread use to Prolog than any other product. So it is non- standard, so it has this awful typing; it will almost certainly become a new standard, and is certainly excellent value.

LPA micro-Prolog

There are a number of Prolog interpreters available for micros, but the only one with which I have any experience is Logic Programming Associates' micro-Prolog. This dialect is quite different from the standard DEC-10 Prolog which originated at Edinburgh University, and can therefore be a bit strange to experienced programmers.

However, it is supplied with the Clark and McCabe book 'micro-Prolog: Programming in Logic' which is an excellent tutorial introduction to the language. It is available for CP/M- 80, CP/M-86 and PC DOS, and the 16-bit versions do have a preprocessor for DEC-10 Prolog (after a fashion).

While Prolog must inevitably remain a curiosity for most of us for the next few years, be aware that micro-Prolog is powerful enough to support some interesting commercial applications. We are currently experimenting, for example, with the construction of a data dictionary program using micro-Prolog.

% Guessing Game 05/09/84
%
test(Try, N) :-
Try = N,
print('You got it !!'),
!.
test(Try, N) :-
Try > N,
print('Too high'),
!,
fail.
test(Try, N) :-
Try < N,
print('Too low'),
!,
fail.
game :-
random(100, Number),
repeat,
ask('What is your guess ? ', Guess),
test(Guess, Number),
!.
continue :-
ask('Play Again ? ', Play),
(Play = 'N' | Play = 'n').
play :-
repeat,
game,
continue,
print('Thanks for playing').

play !   % Start game
% If prolog library doesn't include random function then use
%
% random(R, N) :-
% retract(seed(S)),
% N is abs(S mod (R + 1)),
% NewSeed is (125 * S + 1) mod 4096,
% asserta(seed(NewSeed)),
% !.
%
% seed(13).
%
% If 'ask(Q,A)' missing use
%
% ask(Q,A) :-
% prompt(Old, Q),
% ratom(A),
% prompt(_, Old),
% nl,
% !.
%

Prolog version of the guessing game

APL

IBM APL

I have not seen this APL myself, but I think I can safely assume that it works. It runs on a PC, takes advantage of an 8087 if one is installed, and can apparently exchange workspaces with the mainframe APL.

STSC APL*Plus/PC

Again, I have not run this one, although I presently have a copy for evaluation. I promptly passed it onto an APL-speaking colleague for evaluation, but the documentation looked excellent, and the package has received very good reviews from the US press.

Waterloo APL

From the University of Waterloo in Ontario. No real comments.

LISP

Stiff Upper LISP

Software Toolworks' LISP

Supersoft LISP

Mu-LISP

IQ LISP, for the IBM PC

Waltz LISP

PILOT

Starkweather PILOT

Alias Nevada PILOT, contains additional commands to control a videotape recorder, and is a full implementation by the man who invented PILOT. Complete with sample programs and a simple authoring package, itself written in PILOT.

Apple PILOT

Requires UCSD Pascal/Apple Language Card to run, written in Pascal, seems to be quite good.

Guessing game in Starkweather PILOT.

Assembler

MASM

Application Prototyping

Mainframe computers have existed for over forty years now, and although for the first few years of their existence they were only used by highly numerate computing professionals, for most of that time a body of expertise has developed and grown which enables systems professionals to develop software on behalf of end users.

The problem is, that a large part of the traditional software development technique is inappropriate for use on personal computer systems.

The Case Against Traditional Techniques

First of all, there is the problem that many managers, aware of the low hardware cost of personal computers, expect that software must also have somehow come down in price. The fact is that although the mass market opened up by PC's has reduced the the unit price of commercial software packages, the cost of software development has continually risen because, at base, all software is still hand-crafted by a small group of highly-paid professional programmers. Improvements in software engineering, such as the development of higher-level languages and integrated source-level debuggers, has not offset the increased salary and support costs.

Some managers are therefore unaware of the true costs of software development, and part of the PC support task is to educate users and to incorporate the communication of costs into corporate PC policy.

Secondly, total software development costs must include the cost of abortive or unsuccessful system development. Traditional techniques are particularly prone to failure if not managed correctly - which they rarely are, especially in the PC environment. A large number of software systems fail, largely due to two factors: initial miscommunication between user and analyst, and the failure of the system to pick up errors later in the system development lifecycle due to the almost complete lack of communication between the user the system implementors during the design and programming phases.

The traditional systems development life cycle looks like this:

The problem is, that after the definition phase, during which the only contact with the user is the feasibility study and systems analysis, there is no contact at all until acceptance testing. In consequence, particularly if the systems analysis is not competently and thoroughly performed, what the user will finally receive is the programmers' interpretation of the system designer's conception of the systems analyst's perception of what the user thought he wanted.

It is entirely possible that this does not meet the user's needs, and is nothing at all like he envisaged. In fact, according to research performed by IBM, this happens in a large number of cases. That it happens at all is a savage indictment of our 'profession'.

IF, and it is a big IF, the analyst completely understands the user's problem in business terms, and IF the user is perfectly knowledgeable about the capabilities of the proposed computer system and is capable of expressing his requirements clearly and unambiguously and IF the system designer and programmers are completely competent, then the traditional systems development life cycle will produce correct systems. However, the perfect user and perfect DP professionals are yet to be found, and the traditional technique either fails to compensate for this fact or compensates by over-engineering at different stages of the cycle.

The Alternative: Prototyping

A newer technique, which is becoming popular on PC's and is even spreading to the mainframe world, is prototyping. This technique is based upon the iterative construction of a number of systems, each with additional features, which gradually (although quickly) converge on the user's desired solution.

The goal is to create a system which meets the user's needs in minimum time and at minimum total system cost. It should be borne in mind that the goal is optimal application of resources, not the creation of the perfect piece of code. It may well be (and usually is) that the cost of eliminating every bug from a system far exceeds the cost and inconvenience of the occasional crash.

For example, a systems engineer (and I mean a real systems engineer, not some flunky who acts as a go-between for the customer and the manufacturer) might approach the problem of providing file and record locking in an application this way: if in one year, the probability that two users will attempt to open the same file is 0.2, but the network crashes due to hardware failure or other causes once a week anyway, then the additional cost of record and file locking is not justified. In other words, the benefit - marginally increased network availability - is dwarfed by the cost - a lot of man-hours in software development. Another way of looking at it is that provision of locking would result in a more elegant and a more professionally satisfying system, not a more appropriate one.

The object of the exercise is thus to produce something quick, clean and appropriate, not an elegantly over-engineered and massively expensive system which incompletely meets the user's needs.

Development of the initial prototype is performed in close cooperation with the user, ideally using software tools with which the user is familiar, such as Lotus 1-2-3 or dBASE III+. Close cooperation means that we are virtually guaranteed to be moving in the right direction as judged by the user, who after all, must be the final judge in acceptance testing anyway.

Although you can develop the prototype by writing straight code for dBASE, it is generally much quicker and easier, especially when co-developing the prototype with the user, to use a code generator for this purpose. For example, Genifer, UI Programmer or DataBoss can be used to generate systems very quickly for dBASE or Turbo Pascal.

In the simplest case, one simply designs the screen forms, with the user looking over your shoulder, press the <Generate> key, and lets the code generator write the system. It must be admitted, though, that code generators - even the best - do not write the most sparkling, innovative or elegant code. Instead, they write turgid, boring and often verbose code of an extremely general nature for the most common application modules: data entry, editing, reporting and menus. But then, these are the bits of an application which are the most boring for an experienced programmer to write, and in fact, many programmers have already reduced this task to the kind of 'take a template and modify it' programming performed by the code generators. Having the code generator offloads this work, resulting in greater efficiency, and more time for the programmer to concentrate on the interesting, tricky and most rewarding parts of the application.

Following the development of the initial prototype, the user will request one or both of two kinds of improvements: either functional enhancements or performance improvement. Leave performance issues till later. Concentrate at this stage on functional issues.

Designing the prototype in a versatile language like dBASE, RBase or Lotus means that one can quickly rewrite modules, add modules and customise reports, either by having the code generator regenerate them or by hand-crafting modifications. A shift to a more traditional, compiled language at this stage will tend to 'cast' or 'freeze' the design by making modifications slower and more difficult to implement. Try to keep the design reasonably fluid and general, in order not to preclude later additions and modifications.

Only once the functional design is correct should one start thinking about performance issues. Although it is possible to improve the performance of (for example) dBASE code by using the usual optimisation techniques, such as constant folding, loop invariant optimisation, redundant subexpression elimination, procedure integration and others, it is generally not worth the trouble. The fact is that a law of diminishing returns applies to all such techniques, and one can easily double the development time (thus doubling the cost, even ignoring opportunity cost) while providing a 5% improvement in performance. The user won't even notice a 5% improvement.

Instead, the experienced dBASE programmer has a toolkit of about a dozen techniques which he will apply. These include procedure integration (replacing 'DO <modulename>' with the inline code of <modulename>), shifting of files to a memory-based virtual disk for speed of access, releasing unneeded indexes, replacing SKIP loops with LOCATE statements and others. I presume that users of other tools have similar techniques.

These techniques are applied to the functionally complete prototype, and together will produce a significant improvement in performance.

If the user is still not satisfied with the performance, then there is no point in poring over the code, trying to warm it up, because no matter how far you optimise it, dBASE is still an interpreted language and is bound to be slow compared with fully compiled equivalents. At this point, only increasingly - indeed, vanishingly - small improvements can be expected.

Take the opportunity to switch now to a compiled language. In the case of dBASE III+, two excellent compilers are now available: Clipper and FoxBase+. Both provide faster, more reliable code, for bigger systems, and also offer functional enhancements. Bear in mind that from now on, though, any modifications to the code will require recompilation and linking - a process that may take some time and may not be possible on the user's machine. In addition, debugging and testing now take longer. With this in mind, postpone the switch to compiled code for as long as possible.

If the user still wants higher performance, and is willing to pay for it, the best path at this point is to upgrade his hardware. If he is running on an AT-style machine, then upgrade to an 80386-based box. If already running on this level of machine, then perform yet another translation, to a machine-independent language like C or PL/I, and move the application to a mini or mainframe. It will literally be cheaper to do this than to attempt to further optimise the code to run on a PC.

If documentation has not previously been prepared, get it done now, before it is too late. In the spirit of preparing the entire system quickly and cheaply, avoid written, text, documentation wherever possible and replace it with graphical documentation such as menu trees, flow charts (useless for designing program logic but good for end-user procedures) and data flow diagrams.

To summarise, then, the stages involved are:

The Case for Prototyping

A number of benefits ensue from the use of prototyping. Perhaps the major one is the early delivery of a system to the user. For the reasons discussed earlier, users are not usually able to clearly verbalise their requirements, or to conceptualise the desired system in abstract terms - these skills take some time to develop, even in experienced analysts.

However, once a prototype has been delivered, the user can immediately 'show and tell' - he can point to the way in which the system currently implements, say, a report or data entry screen, and say what he likes (sometimes) or doesn't like (more commonly) about it. One receives comments like 'Can you make the data entry screen automatically generate the internal order number?' or 'This report is good, but I'd like another subtotaled by customer number, and showing total billings'.

These are quite specific requests to which one can quickly and positively (usually) respond.

Furthermore, the high user involvement virtually guarantees the appropriateness of the final design. If the user does not end up with a system which is satisfactory to him, then either the user is totally incommunicative or the analyst/programmer is totally incompetent - a situation which is usually detected fairly early as no progress is made towards a solution.

The early appearance of the prototype system allows correction of design problems before significant effort has been expended in doing things the wrong way. In the traditional approach, it is often easier to revise the specs to accommodate mis-design than to revise the system to meet the specs - and some people do this!

The user should be encouraged to say when the system meets his functional requirements. In fact, this is one of the dangers of prototyping: the user may not say 'When!' until he gets a do- everything system. Progress is made so rapidly that the user is encouraged to keep asking for more and more features, even when they are not cost-justified. A simple way to avoid this program is to keep making the user aware of time expended on the project to date, at internal billing rates.

In this case, one might get lucky: the user might take one look at the code-generator-generated prototype and say 'Perfect! Just what I needed!'. At this point, one writes the documentation (yes, this still has to be done!) and closes the files. Even if the initial prototype requires modification, one only goes through the process as far as is required to satisfy the user's needs. Some iteration may be required through the various stages, but avoid unnecessary refinement.

The benefit here is the avoidance of over-engineering and 'button polishing' - the system is as powerful and polished as required to meet user requirements, but no more. Some programmers exemplify the introverted personality type which derives major satisfaction from producing 'perfect code' - unfortunately such code is usually produced (at vast expense) to meet the programmer's internal standards, related to professional skills and self-image, and have little if anything to do with real-world efficiency.

Conclusions

Prototyping is a valuable technique for improving programmer and analyst efficiency through the efficient production of 'correct' programs: programs which meet (but do not significantly exceed or differ from) user requirements, are reliable and are inexpensively produced.

While prototyping works best in the single-user, stand-alone environment, a functional prototype can be enhanced through the addition of file-sharing and security facilities.

References

Brooks, Frederick P. - The Mythical Man-Month: Essays on Software Engineering, Addison-Wesley, 1975. A useful overview of programming projects and project management. Nothing on prototyping per se, but much of the groundwork is laid here, including the famous Brooks' Law: 'Adding manpower to a late software project makes it later'. The cover, which shows giant prehistoric creatures struggling in a tar pit, is not entirely a coincidence, either.

Gane, Chris and Trish Sarson - Structured Systems Analysis: Tools and Techniques, Prentice-Hall Inc., 1979. A good treatise on graphical techniques for systems analysis which work well as an extension of prototyping for larger projects.

Metzger, Philip - Managing a Programming Project, 2nd Edition, Prentice Hall Inc., 1981. A detailed and well reasoned guide and manual on the wrong way of doing things. Still worth browsing for a number of valid and useful points which can be adapted to the prototyping technique.

Yourdon, E. - Managing the Structured Techniques (Can't remember the publisher - NEVER lend good books out!)

Prototyping Tools

Dan Bricklin's Demo Program

Genifer -> dBASE III+ -> Clipper

UI Programmer -> dBASE III+ -> Clipper

dBASE III+ -> FoxBase Plus

dBASE III+ -> dBx (dBASE - C translator)

dBASE III+ -> dB2C (another dBASE - C translator)

DataBoss Pascal Generator -> Turbo Pascal

Lotus 1-2-3 -> Baler

Documentation Tools

For graphical documentation (menu tree diagrams, data-flow diagrams):

MacDraw

Windows Draw, In-A-Vision, Designer

GEM Draw

For text documentation (avoid if possible):

Any good word processor: Word, Word Perfect, even WordStar.


Page last updated: 25/May/2006 Back to Home Copyright © 1987-2010 Les Bell and Associates Pty Ltd. All rights reserved. webmaster@lesbell.com.au

...........................