Digital Electronic Organs using Off-The-Shelf Technology
by Colin Pykett
Posted: January 2005
Last revised: 8 February 2018
Copyright © C E Pykett
A retrospective note to put this article in context
This article arose from some work I started in 2000 to investigate the possibilities for implementing digital organs offered by cheap, off-the-shelf digital audio components such as ordinary PC's and sound cards. At that time there was a sudden explosion in the power and capabilities they began to provide, together with an equally dramatic decrease in their cost. Although many of us thought this was stunningly impressive at the time, the passage of only a few years might make some of what you read here look hopelessly out of date - after all, at that time Windows XP had yet to appear, and most people were still wedded to Windows 95 or 98! Nevertheless, high performance SoundFont compatible sound cards were beginning to appear, and the speeds and memory capacities of computers were beginning to make low-cost real time software synthesis look attractive for the first time. As of late 2009, both of these techniques are still extensively used and they show little sign of obsolescence - SoundFont compatible sound cards are still widely available and they work on Windows Vista, Windows 7 and Linux. An ever-growing range of software synthesisers also remain part of the scene.
To me, the most interesting retrospective aspect today (2009) is that so many others were obviously thinking along similar lines. The appearance of Marshall and Ogletree's PC-based organ at Trinity Church, New York is perhaps the most well known commercial outcome, and it was certainly one of the earliest. It was one of the products which gave rise to the generic name Virtual Pipe Organ (VPO) to describe these types of instrument, mainly because they were able to synthesise every note of every stop independently for the first time. In this respect the first VPOs catapulted themselves, at a stroke, head and shoulders above the general run of other digital organs of the day. It seems to have taken some years for the industry at large to catch up by finally putting aside the wildly obsolete systems some of them were still using until recently (in 2000 and beyond some firms were still using Z80 or similar microprocessors from the 1980's, and they might still be doing so!). Nevertheless, there were exceptions - some firms were apparently earlier than others in perceiving the potential benefits of off-the-shelf technology, such as Phoenix with their modular hardware synthesis system based on PC sound card technology, and Viscount who started to develop their physical modelling instruments using standard DSP chips in a Linux environment.
Therefore, although some details in this article are now dated, its main thrusts are not and consequently it has not been heavily edited because they still broadly underpin the VPO and off-the-shelf technology scene as it remains today.
Abstract. Digital electronic
organs first appeared about 40 years ago, somewhat before the first
microprocessors became widely available. Thus they needed specialised hardware; this
reflected chiefly the large number of independent sound generating circuits
required. The hardware could only be manufactured sensibly by using custom
LSI: large scale integrated circuit techniques. At the time this was
ground-breaking, though expensive, musical instrument technology and far in
advance of what was available elsewhere. Although simple synthesisers had
started to appear also, they used analogue circuit techniques and were limited
to monophonic (one note at a time) operation.
Today the situation has reversed. The
commercial electronic music industry which services the pop music scene has
made tremendous technical strides. This is because the digital sound and
multimedia business is now growing as fast as the computer business itself,
whereas traditional electronic organs only supply a tiny and declining niche
market. The upshot is that, for example, an average computer sound card
retailing for well under £100 has technical capabilities at least equivalent to
the obsolete and much more expensive systems which continue to be used by some
digital organ manufacturers. If one pays a little more and buys the
hardware and software used by commercial music professionals, the capabilities
are even more stunning. This article develops this theme by
examining in detail what can be accomplished through the use of modern
off-the-shelf computer technology instead of yesterday's specialised components.
It illustrates what can be done at trivial expense using today's personal
computers, and it is no surprise to find that some commercial digital organs are now using
Reading Tip: This article does not
describe in detail how digital electronic organs work, nor how they are
programmed to emit organ-like sounds. It assumes the reader possesses some
basic understanding in these areas. However a companion article on this website,
Voicing Electronic Organs, covers these and related
issues. Contents (click
to access the section desired)
Abstract. Digital electronic organs first appeared about 40 years ago, somewhat before the first microprocessors became widely available. Thus they needed specialised hardware; this reflected chiefly the large number of independent sound generating circuits required. The hardware could only be manufactured sensibly by using custom LSI: large scale integrated circuit techniques. At the time this was ground-breaking, though expensive, musical instrument technology and far in advance of what was available elsewhere. Although simple synthesisers had started to appear also, they used analogue circuit techniques and were limited to monophonic (one note at a time) operation.
Today the situation has reversed. The commercial electronic music industry which services the pop music scene has made tremendous technical strides. This is because the digital sound and multimedia business is now growing as fast as the computer business itself, whereas traditional electronic organs only supply a tiny and declining niche market. The upshot is that, for example, an average computer sound card retailing for well under £100 has technical capabilities at least equivalent to the obsolete and much more expensive systems which continue to be used by some digital organ manufacturers. If one pays a little more and buys the hardware and software used by commercial music professionals, the capabilities are even more stunning.
This article develops this theme by examining in detail what can be accomplished through the use of modern off-the-shelf computer technology instead of yesterday's specialised components. It illustrates what can be done at trivial expense using today's personal computers, and it is no surprise to find that some commercial digital organs are now using this approach.
Reading Tip: This article does not describe in detail how digital electronic organs work, nor how they are programmed to emit organ-like sounds. It assumes the reader possesses some basic understanding in these areas. However a companion article on this website, Voicing Electronic Organs, covers these and related issues.
(click to access the section desired)
Why does some of today’s consumer
electronic equipment become hugely popular and widely used, yet other examples
remain obscure, of minority interest and eventually die?
Take audio cassette tape and the CD. These
are the de facto media standards for audio reproduction and have remained so
for many years. Even in late 2009 you can buy a
cassette or CD anywhere in the world and be confident that it will play in your
deck back home. Yet this is
actually rather remarkable if you think about it.
Why are there not several different types of cassette or CD, as there
once were for video tape? And
because anyone can set up a company to manufacture them, which individuals or
organisations are getting the benefit of the money you pay?
In fact Philips and Sony were responsible for these items, so how have
they benefited from them in a business sense?
Would it not have been better for them to have tied up the patent and
licensing situation so tightly that they, and only they, controlled the market
for the media and the decks?
Another example is the ubiquitous personal
computer. Why is the IBM PC so
popular, to the exclusion of almost everything else? It certainly is not the best in a technical sense, nor has it
ever been, and at the
time it appeared in the early 1980’s there were many competing brands.
Far from Microsoft dominating the operating systems market as it does
today, the nearest equivalent then was a system called CP/M produced by a
company called Digital Research. Bill Gates, when he was but a young programmer
engaged by IBM, based his MS-DOS on it. So why did
CP/M go to the wall years ago?
Several questions have been posed above, so
another one is legitimate at this point – what has all this to do with digital
electronic organs? In fact there
are some common threads running through the answers to all these questions.
If we explore these, it will bring us onto the main theme of
this article, namely the emergence in recent years of some novel yet extremely
cost-effective approaches to making digital musical instruments.
It is not a giant leap beyond this to speculate about the future of
today’s traditional electronic organ market and the firms which currently
Probably the common denominator which applies to all of the products mentioned (and there are many others) is that of enlightened business management. It was a deliberate move by IBM to put the architecture of its then new personal computer into the public domain, rather than to clutch its secrets to its breast in what would have become an increasingly sterile game. The same applied about a decade earlier to Philips, Sony and the audio cassette. The main business advantage of such a move is that the more secretive competition is wiped out virtually overnight. Who will still recognise the name of Triumph-Adler from the early 1980's in personal computers today, for example? Or why did reel-to-reel tape and 8 track cassettes die so suddenly, at least at the mass consumer level? These products were being auctioned off on the surplus market only a few months after the iconoclasms engineered by Philips and then IBM. The disadvantages of such a move centre around how to continue to make money out of a product that can be widely cloned, but the example of Microsoft shows that this is not exactly a show-stopper. From the point of view of IBM itself, the death-dealing blow it dealt to almost every other PC manufacturer meant it could continue to concentrate on its core businesses without being troubled again by the small computer market, yet at the same time it could always dip into PC’s itself if it chose. This remains the case even though IBM has now sold its PC business to China.
In due course a sound and multimedia business, which today is enormous, began to grow round the IBM PC. Sound cards, MIDI interfaces and the like appeared in great profusion, the whole lot later being pulled together by a Windows MCI (Media Control Interface) software architecture whose details all systems programmers could access freely and use without restriction . Along the way Microsoft almost casually added another item to the bag of unprotected, public domain intellectual property when it issued its WAV digital audio file format to the world, based on earlier formats developed by Electronic Arts. E-mu Systems did the same in the mid-1990’s with their SoundFont file specification as used by Creative Technology, the makers of Sound Blaster sound cards. Within weeks, other sound card manufacturers were forced into advertising their products as SoundFont-compatible. What a superb way to establish market leadership! In the operating systems game we are witnessing similar history in the making as the popularity of Linux continues to increase versus Windows, due largely to the fact that anyone can download its source code to see how it works and thus optimise their own products and predict their performance. Yet that did not prevent its inventor becoming a millionaire at an early stage. And today it can no longer be sensibly argued that consumer audio technology is of inadequate quality - unless someone really thinks they can do better than the 24 bit/96 kHz standard commonly found in digital sound systems. This far exceeds audio CD quality, and if anything it's an overkill.
The emergence of off-the-shelf electronics and an open design culture to satisfy the hunger of consumers has permeated every sphere of life, and it is a factor which now has to be taken seriously when deciding how to implement a new piece of equipment. Even lavishly-funded organisations which deal in the highest of high technology, such as the Ministry of Defence, do not develop all their equipment from scratch as they used to do; their COTS (commercial-off-the-shelf) R&D programmes show just how seriously they investigate what the consumer market can offer before committing expenditure on new systems. In the 1970's they were still developing their own computers, compilers and operating systems in-house; now they simply buy products such as Windows and PC's. Doing anything else is, at best, simply not cost effective for any organisation. At worst, it verges on commercial lunacy.
How different all this is to the traditional electronic organ business. In the 1980’s the larger organ firms still resorted to litigation to resolve patent issues, or just threats of it to apply the “frighteners”. One example even surfaced in the columns of a journal intended purely for amateurs , a commercially pointless exercise which shows just how vigorously these firms defended their territory and simultaneously revealed their paranoia. The secretiveness of the trade was exposed at about the same time in the same publication, when the description of a visit to the Bradford Computing Organ at Christchurch in Ilkley was curtailed to two paragraphs at the insistence of its inventors who described it as commercially confidential . This despite the fact it was fully described in, and presumably protected by, a patent some six years earlier! (All this makes it a bit rich that the development of this instrument was funded by the UK taxpayer). Secrecy continues into the present millennium, as we can see from the statement made in 2003 by one who has moved through several senior posts in various electronic organ firms: “...those of us who aspire to the leading edge of the electronic art have no desire to share their hard earned knowledge with all and sundry...” . Strange that anyone could think there are still any secrets in electronic music these days. Some might be better at voicing than others, but that's a different matter.
Surely all this misplaced coyness misses an essential point. Below a certain level, the details of the enabling technology used by an electronic musical instrument are themselves of little consequence, save to those (such as the ad-men or the merely ignorant) whose knowledge is so circumscribed that it becomes an end in itself. What matters is how skilfully that technology is put to the purpose of creating musical sounds. It is no different to the pipe organ, whose technology has been in the public domain for centuries yet whose success or otherwise largely lies in the hands of two or three craftsmen such as the voicer. Therefore if there is to be secrecy in electronic organs, it ought to focus on the sounds themselves rather than on the enabling software and hardware, implying that the data files describing the finished sounds and how they are articulated are the commercially sensitive items rather than the host environment. Yet even this would be ultimately pointless, because a determined competitor could merely copy the sounds of an electronic organ he coveted and incorporate them in his own.
There is another matter which ought to be of consequence to
these espousers of secrecy: the future of their businesses.
If the history of digital technology at large teaches us anything, it is
that those who make their technology the most accessible rapidly gain the
largest market share. Unfortunately, at a time when the market for traditional
electronic organs is dwindling almost by the day, it may be too late to stop
their decline. And the reason why
the market is dwindling? It’s
because of the plethora of expensive, incompatible, closely-guarded systems of
course, put together by a precious huddle of cottage industries which have been
completely sidelined by the fast-track developments in electronic music
Thus some electronic organ manufacturers today seem to be still living in the past.
Forty or so years ago when digital organs
first appeared there was little choice but for them to use custom-designed
hardware and software, and this made them rare and expensive.
At that time the commercial pop music scene was mesmerised by crude
monophonic analogue synthesisers of the Moog type, and it remained inexplicably
stuck in that rut for some years. But
today the situation has reversed, and the plain fact is that anything which can
be done by these special-purpose digital organ systems can be done much more
cheaply by using widely available off-the-shelf technology.
Some of the more far-sighted digital organ manufacturers have begun to
perceive this (the PC-based digital organ by Marshall and Ogletree at Trinity
Church, Wall Street, New York City is a well known example). A much
smaller and earlier instrument used a PC to produce the lower notes of a
portable organ which also uses pipes . To demonstrate the truth of the assertion that there is nothing really
different between digital organs and commercial music technology, we must now
leave the business arena to dive into the more technical realm of music
The following discussion focuses on particular commercial items including the IBM PC, the Windows operating system and the Audigy sound card to illustrate various topics. This does not mean that others cannot be used, and I merely mention these items because they are so well known.
What is a synthesiser? It is an electronic device capable of producing an arbitrary musical sound. Therefore digital electronic organs are the same as synthesisers, those ubiquitous items beloved by devotees of pop music. If a manufacturer of digital organs was to claim, as some do, that his products were not synthesisers, he would be admitting that they were less capable than those of his pop music cousins when it came to simulating in minute detail the nuances of real sounds. For electronic organ applications a synthesiser worthy of the name has to be able to control how the sound of an organ pipe begins when a note is keyed, how it is maintained while the note is held, and how it terminates when the note is released. It should have sufficient resources to cope with a full organ situation when many notes are keyed with many stops drawn, including couplers. It should be able to simulate the differences which occur in a pipe organ when an individual pipe speaks compared to the situation when that pipe speaks together with others speaking close to it. It should simulate effects such as wind sag, due to a momentary drop in wind pressure when there is a sudden demand caused by a full chord. These, and more, effects must be simulated in real time so that the performer experiences no perception of delay (latency). Although many electronic organs fall short of these desiderata, they can all be achieved by today's pop music synthesisers at almost trivial expense, and it is for this reason that this article was written. What on earth is the point of continuing to use an unnecessarily expensive customised organ system in today's situation, especially if its capabilities are limited and out of date? All that is necessary is a grasp of how to use today's technology to the best effect.
Figure 1. Basic synthesiser module (note generator)
Figure 1 shows the elements of a simplified digital synthesiser module or note generator. It is somewhat simpler than that which will be found in most digital musical instruments, yet it is capable of just about everything that could be required of an electronic organ, so we shall begin by examining this configuration. It should also be noted that there will be many such modules in a complete musical instrument, mainly to cater for the situation when many notes are keyed simultaneously with several couplers drawn (the polyphonic situation), as well as with many speaking stops drawn simultaneously (the multitimbral situation). This topic will be explored later. All of the modules are under the control of one or more computers.
basic synthesiser module consists of a waveform generator, conventionally
called an oscillator in computer music parlance, whose amplitude and frequency can be varied
virtually instantaneously in
real time in response to control demands as shown.
Its output is then digitally summed with those from an arbitrary number
of identical modules before being delivered ultimately as the output to a DAC
(digital to analogue converter), which feeds an analogue power amplifier
and a loudspeaker system.
The type of waveform produced by the oscillator is not defined, because it can be anything at all. You should not be misled by the name into thinking that it is restricted to the simple repertoire of waveforms produced by analogue oscillators, such as sine or square waves. At one moment the waveform could be a complex organ pipe sound sample, complete with starting transient. At the next it might have been replaced with a single cycle of a sine wave which loops for an indefinite period, representing a single harmonic of a sound. Then it might change yet again to a sample of wind noise. Waveform selection is under computer control. When the waveforms are selected from a set of sampled sounds stored in memory, the system is often referred to as a wavetable synthesiser.
The frequency and amplitude envelopes are
also under computer control. These are applied to
the waveform on a sample-by-sample basis as it is emitted from the generator.
For example, by applying a periodic low frequency variation
a vibrato will be impressed on the generated sound. The oscillator
responsible for this is denoted LFO1 in the diagram, the name denoting a low
frequency oscillator. By applying a sudden attack and a slower decay amplitude
envelope to the amplifier, the
system is able to simulate plucked or percussive instrument sounds.
This would be achieved by modulating the gain of the amplifier using the
appropriate ADSR (Attack/Decay/Sustain/Release) characteristic. Or
it might be necessary to modulate the amplitude as well as the frequency of the generated waveform
periodically, in which case a second low frequency oscillator (LFO2) would be
used. The waveshapes and frequencies of both LFO's are
independent and controlled by the computer. Although actual synthesisers might be more complex than suggested
by the diagram, this basic system will be able to simulate just about any
desired musical sound.
The ability to sum the outputs from several
amplifiers is important. For
example, a particular group of stops in an organ might need to be associated
with a particular loudspeaker channel. Or each waveform might be a pure sine wave representing a
particular harmonic in a sound, many of which will be summed to produce a
complex output. This is the process
of additive synthesis, though direct implementation in this manner could seldom
be used on a large scale, that is, in a situation of many stops with many
harmonics and with many notes keyed. This is because the number of
synthesiser modules required could become prohibitively large, and because the
number of real time additions
required per second could exceed even the capabilities of the fastest processor.
It is seen that the functions performed by a synthesiser module fall into two categories. The first is that of waveform generation, governing which 'raw' waveform is produced when a particular note is keyed for a particular stop. The second is that of waveform modulation, governing how the amplitude and frequency of the selected waveform are varied over time. For generation, it is not only the waveform itself which has to be defined. A typical synthesiser is able to control many other parameters affecting signal generation, including the range of notes over which a particular waveform is active (its keygroup), loop points between which the waveform loops as long as the key is held, filter settings to alter the spectral shape of the waveform, etc. (A digital filter block is not shown in the diagram above). Modulator parameters control, for example, the attack, sustain and release phases of the waveform as well as the tremulant characteristics. A typical synthesiser will offer over 50 generation and modulation parameters which together control the articulation of the chosen waveform. In addition, other control parameters determine which loudspeaker channel is to be used for a given module at a given instant, and audio effects such as reverberation, etc. A complete assembly of synthesiser modules together with their associated modulators is often called a sound rendering engine.
An important issue is the number of synthesiser modules required and how they are implemented. In simple terms, it is necessary to provide as many modules in a digital organ as the number of pipes which could sound simultaneously in a pipe organ with the same stop list. Otherwise there would obviously be missing notes in some circumstances when a full chord was played because all of the modules would have been allocated by the control computers before the demand of the player had been satisfied. This illustrates a defect of many digital organs which is often noticeable, and it is of course irritating and indefensible for an instrument of any pretension. Although the number of modules required is not large in an absolute sense (it is much less than the total number of pipes in the equivalent pipe organ), it will often be inconveniently large to the hard-pressed engineer tasked with designing the rendering engine.
To arrive at a sensible number, consider first of all the requirements of a single manual department. It will usually be the case that 'full organ' on this department can be achieved by drawing no more than 8 stops or so, rather than all of the stops. For example, the beautiful Father Willis swell organ at St Paul's cathedral in London has only 12 speaking stops, yet 'full swell' requires no more than 8 (4 reeds, 3 diapasons and the mixture). In fact it can be achieved using less than 8 stops. Assuming that a chord of more than 8 notes will seldom be played, this implies the need for about 64 synthesiser modules to cater for a department of this size. (This simple calculation ignored the fact that the mixture has several ranks but we shall proceed with it for now). 64 is of course one of those magic numbers so beloved of computer tecchies because it is an integer power of 2, and therefore it fits neatly into a scheme of computer control. But this is neither here nor there; the point is that this sort of number governs the minimum synthesiser requirement for a single department of a typical digital organ.
We can reasonably conclude that a similar number would be required for each of the other manual departments, with the pedals needing less because the number of notes played simultaneously is also less. The most basic 2 manual and pedal scheme would therefore seem to require at least 128 synthesiser modules. In engineering terms this is not a trivial requirement. Each module is very complex at the level of its wiring diagram, and because of this it could not possibly be constructed economically using ordinary printed circuit wiring techniques and discrete circuit elements such as transistors. The only way to proceed, assuming we are using a hardware realisation for each module, is to use LSI (large scale integration) techniques in which the size of each circuit is shrunk by a large factor by integrating it with many others on a silicon chip. Designing such chips is expensive and the process requires highly qualified and experienced engineers. However, if you are making the chip for a product which will sell by the million, the one-time R&D investment when amortised among the production units becomes almost insignificant. But unlike the synthesiser engines used in the plastic 'keyboards' used by pop musicians, electronic organs do not sell by the million, so they are correspondingly more expensive. This illustrates the problem for an organ manufacturer today, and why some of them stick with technology which is years out of date. As recently as 1999, Wyvern organs in the UK was still using the decades-old Z80 microprocessor . Their article described a typical additive synthesis instrument using 'Bradford' techniques, and it showed the limited Z80 clock speed of 20MHz demands the use of a lot of special purpose hardware, some of which is still built laboriously by hand. The investment required to update such technology would likely bankrupt many a small concern. Today's commercial music technology has moved way beyond this.
We have estimated that at least 64 synthesiser modules (note generators) would be required per manual department to cater for the needs of normal playing. But what if the organist ignored everything s/he had been taught about economical registration and nevertheless drew every stop in sight? After all, Bach himself did this to test a new pipe organ! And what if there were octave and suboctave couplers among these stops, each of which virtually doubles the number of modules required? The answer to these questions would be a disastrous exposé of the limits of this particular instrument. Because the cost of a digital organ system is strongly related to the number of modules available (termed its polyphony), manufacturers strive hard to keep the number down. To do this a number of tricks are employed in the trade. One clandestinely switches off stops which are judged to be of lower importance to an ensemble when many stops are drawn. A Dulciana drawn on the Great at the same time as the Tromba would be a prime candidate for such treatment, and it would be restored when the Tromba was no longer in use. Another is slightly cleverer - it would combine the waveforms for both stops in this example and then load the composite waveform into a single module. The slight loss of 'chorus' which would result would be the penalty paid. But there is no such easy answer when octave couplers are used because they can so easily swamp the demand for note generators as soon as they are drawn, so the trade simply does not provide them unless it has to. Look carefully at the stop lists of even the largest electronic organs to see the veracity of this statement! Even when they are present, it is unlikely they will couple through the inter-departmental couplers (such as the swell to great) for the same reason, thereby losing an important element of simulating mechanical action organs.
With such techniques it has been possible for manufacturers to sell complete organs for years with as few as 64 note generators, and few organists seem to notice at first. (When they do, they immediately lose empathy with the instrument concerned. There is nothing worse than to lose the top note of a full chord on the manuals when you play a pedal note). In fact, a large number of digital organs have used blocks of 64 note generators for many years, such a composite unit being termed a 'music module' by some makers, such as those who use the Bradford system. If you want to see an organ salesperson jump though hoops, ask her/him how many synthesiser modules or note generators the instrument s/he is pushing onto you uses.
There is another way to increase the polyphonic limit of a digital organ without actually increasing the amount of note generators implemented in hardware. It is done by time-multiplexing each of the note generators realised in silicon so that it services the needs of several, repetitively and in succession. Successive digital samples passing through the single generator belong to the several notes which it sounding. In this way the physical generator hardware can be shared, usually transparently to the programmer, who writes his software as though there was a significantly greater polyphony than the actual hardware would support at first sight. Naturally, this approach means that the data rate of the rendering engine must be higher than that of each note generator if it were to be separately realised in hardware, and therefore the chip has to be capable of supporting this increased data throughput.
Yet another way of increasing the polyphonic limit is to use, paradoxically, no specialised note generating hardware at all, which brings us to the topic of software synthesis.
Until fairly recently the rendering engine in a synthesiser was invariably implemented in hardware, albeit compressed into a few extremely complex silicon chips. This still remains common today, but over the last five years or so the blistering speed of recent processor chips such as the Pentium and Athlon has enabled much more of the work to be done entirely by the program itself. The architecture of the synthesis module sketched in Figure 1 can just as readily be simulated in software as constructed physically in hardware, provided the software can execute fast enough. At processor speeds over 1 GHz or so the software can run fast enough, because not only is the processor being clocked at this speed but chips such as those mentioned can also execute complete machine instructions at this speed much of the time. Compare this with the lowly Z80, whose maximum clock rate is 20 MHz and whose instructions require typically 20 clock cycles to execute. Such a processor is 1000 times slower than even a modest PC today. To emphasise what this difference means in practice, it is worth remembering that something which executes within half a second on the faster processor would take well over eight minutes on the slower. Therefore digital organs which still rely on elderly processing engines cannot possibly compete with those using modern ones - end of argument.
Using just a single fast PC it is quite feasible to make a software synthesiser which simulates separately every pipe of every stop on a medium sized pipe organ and which has no perceptible polyphonic limit. This feat is facilitated not only by processor speed but by virtue of the huge memory capacity of these machines, enabling large numbers of waveforms to be stored or pre-computed before being called up as required. Storing the waveforms in advance reduces the need to perform software-intensive operations on the fly, such as interpolation to change the frequency of a waveform to match the latest note keyed, and it greatly simplifies additive synthesis also. However the software approach also has some downsides which will be explored later.
We shall now look at some examples of off-the-shelf hardware and software to justify the assertions made above. We shall take the ubiquitous computer sound card as a starting point for the discussion, and it is worth recommending at the outset that you set aside any negative preconceptions you might hold about these items. If we allow familiarity to breed contempt, we shall miss much of what these apparently humdrum objects can offer, although the undoubted shortcomings of sound cards will be discussed also. Mostly the discussion assumes that a card has been bought on the retail market together with its drivers and other software. However some readers may be in a position to procure them as OEM items, in which case some of the following may not be relevant. Some may also possess greater insight into how the cards work than is assumed here, perhaps because they hold Registered Developer status from firms like Creative Technology for example. I have tried to assemble the material so it offers as much as possible to this wide potential audience.
Sound cards retail from under £10 to around £1000 in the UK. Those occupying the bottom end of the market are mainly aimed at the home movie viewer or computer gamer and they offer virtually nothing of interest to us. At the top end they are the tools of the commercial music producer, offering a huge range of options and facilities only a fraction of which are of any value to us here. In the middle, rather like Goldilocks, we can find some which are reasonably optimum for our purposes, and the Sound Blaster Audigy series of cards by Creative Technology will be examined in more detail as representative of current commercial practice. The original Audigy appeared in 2002, and it has now been superseded by the Audigy 2 and the Audigy 4. However there is little difference between them as far as sound synthesis is concerned. Currently the Audigy exists in various forms ranging in retail price from below £100 to about £150. All of them have identical synthesiser capabilities, the differences relating to the degree of fanciness of the user interface and the amount of accompanying software (little of which is of any interest as we shall see later when we discuss the control software requirements). For the equipment manufacturer there are other and correspondingly cheaper OEM versions available as well.
All Audigy cards are based on E-mu Systems Inc's 10K2 rendering engine which has two independent synthesisers and 64 hardware note generators shared between them. Either synth can grab all the note generators, but if it does the other has no capability while this condition persists. Avoiding this requires some care in distributing the required sounds across the two synths. The architecture of each generator is fairly complex and offers pretty much the flexibility one could wish for. As with so much digital consumer equipment, one's astonishment at how much functionality can be obtained for so little outlay never diminishes. A somewhat simplified diagram of each note generator is in Figure 2.
Figure 2. Simplified architecture of the E-mu 10K2 synthesiser module (note generator)
The waveforms used by the oscillator are stored in the main memory of the computer, a useful feature because it is quick and easy to modify them on the fly if required while notes are sounding if you know how to get at them. However this is seldom necessary given the flexible modulation characteristics which can be applied to them. There is no limit on the number of oscillator waveforms other than the available memory size. The pitch of the note can be modified in real time while it is sounding via three sources: two independent LFO's and one non-periodic modulator under software control. Pitch modulation is done by an 8-point hardware interpolator, which effectively stretches or shrinks the waveform while keeping the actual sample rate constant. Thus tremulant and other effects such as wind sag can be applied. This modulated output is then passed to a filter whose characteristics can also be modulated. The filter has a low pass characteristic; in digital filter terms it is an IIR (infinite impulse response) filter whose cutoff frequency and passband characteristics are programmable, again in real time. These parameters derive either from LFO2 or from an arbitrary time envelope defined by the programmer, or both. The filter is followed by an amplifier whose gain is also variable from two sources as with the filter, enabling additional tremulant effects and ADSR to be implemented. Finally the amplifier feeds two output channels so that the sound can be "panned" between them. It also feeds two audio effects modules which pan across the output channels. Typically these will be reverberation and chorus units, implemented digitally. The receive lines from these modules are added back to the output lines in proportions which are under software control.
Each modulator has a substructure as sketched in Figure 3.
Figure 3. Simplified Audigy modulator schematic
This is actually a simplified version of the actual modulator configuration to avoid obscuring the essentials. The destination summing node equates to the black blobs shown earlier in Figure 2. Behind these, we can therefore see that each modulating signal can arise from two input sources, both of which can be multiplied by various factors. In addition, source 1 can also modulate source 2. This composite modulating signal can then be transformed mathematically by any desired function, enabling a logarithmic transform to be applied to linear inputs for example. This is obviously an extremely flexible system whose applicability is only limited by the imagination of the programmer, and for organ purposes much of this potential flexibility would never be needed.
How can we best take advantage of the flexibility of the hardware architecture of this type of sound card in voicing a digital organ? Unless we have a masochistic compulsion to keep reinventing the wheel, there seems little point in using anything other than off-the-shelf methods yet again, and in the case of the Audigy card SoundFonts are the most widely used. 'SoundFont' is a trademark of E-mu Systems Inc, and it refers to nothing more than a standard file format containing all of the oscillator waveforms and modulator information required for a rendering engine to reproduce the desired organ stop list. It is non-specific as to the details of the rendering engine, and an engine can use as much or as little of the information in a SoundFont file as the designer wishes. However, perhaps not surprisingly, the SoundFont concept lies fairly close to what the 10K2 engine can offer. Both frequency domain (additive synthesis) as well as time domain approaches to voicing can be used with SoundFonts because no assumptions are made about the type of waveforms used by the oscillators, there are no restrictions on their number, and none on the number of ways in which they can be combined to produce the final sounds. Therefore each waveform could be a sine wave of a different frequency, representing a single harmonic of a sound, just as readily as if they were the fully pre-formed waveforms used by a time domain sampler. In fact, it would only be necessary to have a few sine wave samples in this case, as the frequency and time modulation facilities available are so elaborate. Each harmonic can be given a separate ADSR characteristic, and the frequency of each can be varied to simulate the anharmonicity of partials during the attack transient phase of the waveform. The rendering engine will dutifully add all this together for you when a note is keyed if it is capable enough. SoundFont files can be very large if many oscillator samples of long duration are incorporated. Any restrictions on SoundFont realisation which might arise in practice would be those imposed by the rendering engine itself, not the font.
The SoundFont file format was made publicly available in about 1998  when it moved to the SoundFont 2 standard (version 2.1 is the most commonly used today). It conforms to the Microsoft RIFF (Resource Interchange File Format) tagged file structure of which WAV files are also a subset, though SoundFonts are considerably more complicated and anyone wishing to parse them from scratch would do well to have some familiarity with list processing techniques. (I speak from painful experience, being suddenly thankful that I had cut my teeth on LISP and Algol-68 over 30 years ago). All of the oscillator and modulator information is stored in the form of unordered bags rather than ordered lists, with an intricate pointer arrangement for tracking the functional linkages between the bag elements. Fortunately most if not all voicing operations can be executed using commercial SoundFont editors without requiring the user to have any knowledge of what is going on inside the SoundFont itself. Such editors operate within attractive user-friendly graphical environments as well, making the voicing operation as painless as it can possibly be. Again we find evidence of the enthusiasm of manufacturers to encourage usage of their products by issuing some SoundFont editors as high quality freeware.
A screenshot of a well-known freeware editor, Vienna, is shown below (the less than perfect image quality was used deliberately to minimise download time for those on restricted bandwidth connections). This program can be obtained from several sources on the Internet, including Creative Labs' own website. The example shows the SoundFont structure for part of the Vox Humana rank used in the dual stop list organ described later. This rank used 14 sound samples (equivalent to voicing points) derived from recordings of a Wurlitzer theatre pipe organ, considerably more than in many commercial instruments, although the limiting extreme of one sample per note could quite easily have been used. The number of voicing points and their positions are examples of where the judgement of the voicer (me in this case) is important to avoid unnecessary work. Some idea of the range of articulation parameters for the particular keygroup sample highlighted by the blue bar can be got from the numerical values displayed in the lower panes of the screen. Other editing screens are available in which further modulation parameters can be adjusted. The virtual keyboard on the display is playable directly using the mouse or via MIDI inputs from an external keyboard, either of which enables the effects of voicing changes to be assessed immediately.
You can now see that if the samples consist of pure sine waves, SoundFonts and the Vienna editor are just as applicable to additive synthesis as to sampled sounds - for additive synthesis the articulation of individual harmonics can be adjusted in minute detail at any point across the keyboard, note by note if required.
Editing tools such as this (and there are of course others) compare favourably with anything used anywhere in the industry today, so if anyone really wants to spend (waste?) time and money taken off their company's bottom line by developing their own, all I can say is good luck to them. To emphasise the point again, the really important matter is how effectively the tools are used in voicing, rather than getting too hung up about the tools themselves.
Screenshot of a 'Vienna' SoundFont editing screen
At the end of the editing (voicing) operation, the SoundFonts can be loaded into the sound card for permanent use by the organ system. The Audigy card can accept almost any number of SoundFonts because they are treated as though they are MIDI Banks, each containing 128 MIDI Presets. The two notional synthesisers available in the Audigy can also be loaded independently with a different configuration of SoundFont banks. Banks are selected as required using the usual MIDI Bank Select controller messages.
There are several ways in which a SoundFont could be configured for organ purposes. The most obvious is to use the top level of the font, the MIDI Presets, as the actual stops of the organ. Each of these will draw on the oscillator waveforms in various ways, modulated as required to achieve the necessary ADSR and tremulant characteristics etc. The flexibility of the system is such that it is easily possible to develop a font in which each note of each stop can be separately voiced with its own separate sample(s), thereby emulating a pipe organ as closely as possible.
The disadvantages of SoundFonts as well as their strengths are mainly related to their implied reliance on the MIDI protocol. For organ work, probably the major shortcoming is the difficulty of selecting more than one stop at a time, and a satisfactory solution to this problem requires custom control software to be developed to interface the keyboard contacts and stop switches to the sound card driver. This is a topic in its own right which will now be discussed.
This is the difficult bit. A disappointment with sound cards such as the Audigy is the limitations imposed by the accompanying control software provided by the manufacturer, which only enables a fraction of the potential offered by the rendering engine and its SoundFonts to be realised. For example, the two synthesisers of the Audigy cannot be used simultaneously, nor can the two MIDI keyboard input ports if one relies solely on the retail software provided. Only one synthesiser and one MIDI port can be selected at a time. Nor can the MIDI Presets in the SoundFonts be 'layered', meaning that organ stops cannot be built up in the normal manner - only one Preset can sound at a time when auditioned directly from an external keyboard. This is OK for laying down the tracks of a pop music song to be rendered later on a sequencer but hopeless for organs which have to be played in real time. Moreover some versions of the software contain inexcusable bugs, one of them being that it cannot cope with anything other than the simplest MIDI signals. For example, the early Audigy cards were confused by the active sensing signals emitted continuously by much commercial MIDI equipment, and they would only work properly when these signals were filtered out. It is these rather inexplicable restrictions which have doubtless contributed to the low esteem in which sound cards are held by professional musicians and musical instrument designers. The fact that the cards can only be driven via MIDI using the commercially supplied software is also a limitation in itself.
However if we look below the surface, all of these restrictions are fairly straightforward to overcome provided one is prepared and able to do a bit of programming. To assist this, any sound card can be persuaded fairly readily to yield most of its secrets because ultimately it has to be compatible with popular PC's, their operating systems and the vast range of games and music application programs. The point to keep in mind is that the programmer's interface to the sound card driver has to be compatible with the usual API (Application Programming Interfaces) to the operating system if the card is to be reasonably portable across a number of platform variants. If it was not thus portable, it obviously would stop selling pretty quickly and would gain a poor reputation. In the case of Windows versions after Windows 98SE inclusive, this means that the card drivers must conform to the Windows WDM (Windows Driver Model) standards, and in turn this implies that all of the relevant Win32 MCI (Media Control Interface) function calls must be implemented properly by the driver. I have investigated the performance of current Audigy drivers extensively on Windows 98SE and XP and have found this to be the case. In parenthesis it therefore begs the question as to why the manufacturer's own software which interfaces to the same drivers is so disappointingly limited in scope, but no matter. We can move beyond it.
By interrogating the driver using the normal Windows MCI utility functions which most music application programs employ (e.g. midiOutGetDevCaps etc), Windows will vouchsafe unto you its device ID numbers for all of the card's external MIDI input and output ports. More importantly, it will also reveal the internal MIDI ID's for the two hardware synthesisers. By sending MIDI Program Change messages internally to either synth, any Preset of any SoundFont Bank can be selected. Moreover, the driver works internally in MIDI Mode 3, meaning that you can associate any Preset in any Bank with a particular MIDI channel and have it sound on that channel only. Therefore any combination of up to 16 Presets in each of the two synths (i.e. 32 in all) can be made to sound simultaneously using this method, corresponding to a maximum of 32 simultaneous speaking ranks. If extension is used as in the theatre organ, each rank can correspond to several stops. The potential capability of a single Audigy card is therefore quite extensive and flexible when used in this manner. These Preset selection messages would of course correspond to those arising from the outside world when the player manipulated the stops. As far as the sound card itself is concerned, it will simply think it is being driven by the various tracks laid down on a sequencer because the messages being received and interpreted by the driver are no different from those it would encounter in that application. We are merely using our own control program to enable the messages to be handled instead from a collection of stops and keyboards as they are played in real time by an organist.
Note that these internal MIDI stop-selection messages have no essential connection with those relating to the external MIDI ports. Indeed, the latter need not be used at all if you want your keyboards and stops not to be encumbered by MIDI. For example, using your own control program it is quite possible to arrange for the keyboards and stops to be polled (scanned) via one of the PC's own ports such as the parallel port instead, the control program then translating the player's actions into driver-compatible MIDI messages as outlined above. More will be said about this later.
A custom control program of this type, which interfaces your keyboards and stop switches to the sound card driver, would typically be written in C or another of the C-based languages such as C++ or C#. At least, this is the natural choice for Windows because Microsoft's MCI documentation seems to assume the use of one of these languages, although other languages such as Pascal and Delphi are also apparently used by some programmers. The C languages certainly work very well in these applications even when using non-Microsoft compilers and program development environments, and there is an enormous literature available to assist the programmer much of which has been published by Microsoft itself (e.g. ).
At first sight a significant disadvantage of a single sound card of the Audigy type is its 64 note polyphony limit, and the only way to increase this is to use more than one card. But in fairness this can scarcely be levelled as a criticism; obviously there will be a limit to the hardware polyphony of any rendering engine, and for the money we are jolly lucky to have as many as 64 notes available. But criticism is more justifiable when we look at the manufacturer's software - the drivers supplied will not support more than one card at once in a single PC, although the recent Audigy 2ZS is able to switch from one to another. One wonders why this lack of flexibility persists? The makers could but sell more cards if their drivers supported them! Thus, again, we have to look elsewhere to solve this problem. It is possible for those with the determination to write their own drivers, but fortunately this has already been done by some brave mortals who have made the fruits of their labours available to the rest of us. Some of these third party drivers will support an arbitrary number of cards, limited only by the number of physical motherboard slots available to connect them to the PC bus. They also tend to focus on the needs of the musician rather than the gamer, thereby redressing an unfortunate bias as far as we are concerned in the manufacturers' software.
There were up to 6 audio output channels on the original Audigy sound card, catering for 5.1 surround sound. The Audigy 2 provides up to 8 channels for a 7.1 speaker setup. The retail software supplied with the cards includes a set of WAV files to assist with channel identification - a mellifluous voice tells you which is which after you have connected the loudspeakers, and anyone who has tried to cope with this number of channels will know how useful such a facility can be. The various channels are of course intended to provide a surround sound environment for computer games and the home movie enthusiast, whereas we want them to receive the outputs of the various organ stops. Unfortunately it does not seem possible to use more than two independent channels when using the synthesisers of the sound card. Although these two channels can be selected from the full set and pairs of channels can be duplicated (e.g. you can send the synth outputs simultaneously to the front, side and rear speakers), this is not the same thing as being able to send a particular organ stop to one of eight loudspeakers. However, with multiple sound cards the number of fully independent channels is increased, just as with the total polyphony. Two cards will allow any stop to be routed to one of 4 loudspeakers, or split in varying amounts between them. Three cards will give you 6 channels, etc.
I have developed various control programs running under Windows 98SE onwards, and they run quite happily on recent computers under Windows XP. They are written in C or C++ and they interface an ordinary organ console to one or more sound cards. Some further details might be of interest. One version of the program catered for a 2 manual organ with a dual stop list (classical and theatre) having the following specification:
Hear the classical organ:
Prelude on the hymn tune 'Halifax' (W T Best) - 0.98 MB/1m 4s
Reflection (Charles Wood) - 2.01 MB/2m 16s
THEATRE RANKS AND PITCHES:
Hear the theatre organ:
Holsworthy Church Bells (S S Wesley arr. Pykett) - 3.45 MB/3m 46s
"I've never heard electronic Tibias this good ... "
(a theatre organist)
The simulations are exact in the sense that the classical organ uses no extension, borrowing or duplication whereas the theatre one is fully unified. Thus each 'pipe' can only be used once for the theatre organ, which has the incidental advantage of reducing the polyphony demand for the considerably larger stop list.
The organ works with one or more Audigy cards, or others which are SoundFont compatible (desirably with a 10K1 or 10K2 sound engine). The program has an 'economiser' algorithm which removes/combines stops in an ensemble if the polyphony limit is approached, and these stops are then restored automatically when appropriate. Obviously the use of more than one sound card is highly desirable, although the economiser makes it just possible for an organ with the stop lists above to be played using only one card without the economiser operation becoming apparent to the organist. Much better results are obtained using two cards in situations where a maximum of four audio channels is sufficient, and in this configuration (i.e. with 128 note generators) the economiser is hardly ever activated during normal playing for an organ of this size. Several SoundFonts are used, one in each sound card synth, some of the stops being derived using additive synthesis with the others existing as sampled sounds. They have fully developed attack and decay transients where appropriate, mainly for the classical stops of course.
There are two expression pedals. In "classical" mode the left hand one affects the great and pedal stops, and the right hand the one those for the swell. In "theatre" mode the left hand one controls the Main ranks and Traps and Effects, and the right hand one the Solo ranks and Tonal Percussions. There is also a separate sustain pedal for the piano. Most importantly, the expression pedals control the tone quality of the "enclosed" stops in the same way as does a real swell box, rather than merely acting as a simple volume control. Without this it is impossible to experience the pent-up thunder of effects such as 'full swell'. They also respond with a time constant selected to simulate the motion of heavy shutters (shades). For the classical stops this reflects the impossibility of moving real shutters arbitrarily quickly, even if the linkage in a pipe organ is mechanical. For the theatre organ it gives the player some feel for shutters controlled remotely by an electro-pneumatic action regardless of how quickly the expression pedal itself might be operated. Such details add immeasurably to the effect and pleasure of an electronic organ, both from the player's and the listener's perspective.
The tremulants are set individually for each stop and rank in terms of speed and the depths of frequency and amplitude modulation, and these parameters are also tailored properly across the compass of each stop/rank. The Tibias on a theatre organ will usually have their own tremulant which will therefore have different characteristics from that for the other ranks in the same chamber. There are three distinct tremulants for the theatre organ and one for the classical one on this instrument, although many more could be provided if required. In this as in so many respects, one has to temper flexibility with realism. Some of the tremulant characteristics can be adjusted quickly by the player using the interactive voicing facility (see below).
Very small random variations of frequency and amplitude are introduced into the speech of each stop in real time, differently for each one. As with a pipe organ they are almost imperceptible if a single note is auditioned, but in chorus they remove that horrible iron-hard, in-your-face type of sound that is characteristic of so many digital organs. If required, it is possible to simulate the bubbling and burbling which some string and reed stops exhibit if the key is held down for an extended period. Very easy to do using the elaborate modulation facilities of a 10K2 sound engine, these effects (as well as those described above for the tremulants) are more difficult to implement with software synthesis as described later.
The program accepts console inputs from the keys and stops in two forms - MIDI messages or polling (scanning) via the parallel port. All MIDI inputs are simple Note On or Note Off messages, the channel numbers indicating whether they refer to keys, stops, swell pedals, pistons, etc. The program does not need to be configured for the two types of input because the MIDI inputs generate interrupts whereas polling is the default activity when the internal MIDI message queue is empty. Thus a mixture of the two types of input can be used simultaneously, for example one console generated MIDI from its keys and pedals but used polling for the pistons and stop switch array. Another version of the program also controlled the combination piston system itself via the parallel port. Fast serial ports such as the USB could also be used, but not slow ones such as those generally used for the computer keyboard and mouse.
Direct scanning of the console contacts enables the slowness, delays and choke problems of MIDI to be eliminated completely and it is the approach I prefer to use whenever possible. I arrange for the entire console switch array to be scanned at least 350 times per second, much faster than standard commercial multiplexing practice and far faster than a MIDI system could respond, and the program can of course set the scan rate itself depending on the CPU speed in use. With two 61 note manuals and a 32 note pedal board, there remain 102 scan addresses which can be used for stops, pistons, swell pedal contacts, etc using the byte-wide data bus of the parallel port. The interface between the parallel port and the console consists typically of half a dozen ordinary LSTTL or CMOS packages in this case. The addressing range can be doubled quite simply by using two (or more) address pages selectable by the program, and this caters for the largest conceivable instrument. The console contacts are of course addressed via a matrix to reduce the amount of wiring required. A possible EMC problem associated with such fast scan speeds is the radiation emitted unless proper screening and other precautions are used in the console. A typical console interface is depicted at Figure 4.
Figure 4. A typical PC-to-console interface
The program has a user voicing mode which can be entered by pressing a special piston. In this mode the loudness of all classical stops and the theatre ranks can be adjusted easily by the organist herself, together with the overall tremulant characteristics (separately for each rank in the case of the theatre organ). Using these facilities it is possible to tonally finish an instrument without having to modify the SoundFonts themselves, which is a more time-consuming operation and one for which more experience is required. The settings are retained at switch-off, although a "return to factory settings" option is available if the user makes a pig's ear of the operation and then panics!
A screenshot showing the simplicity of the display used for adjusting the tremulants for the Solo ranks of the theatre organ is at Figure 5. The tremulant depth of the selected rank (Tuba in this case) appears at the top left of the screen, where it can be edited as necessary. Most people would want the tremulant depth for the piano kept firmly at zero (!), but a tremulated Chrysoglott turns it into an attractive Vibraphone when properly adjusted. Pressing the 'NEXT DEPT' button cycles round all of the available editing screens for the theatre organ; a similar set is available for the instrument when in Classical mode. The 'EXIT' button returns you to the organ with your edited settings.
Figure 5. A typical interactive voicing screen.
The program is written as a generic processing kernel, independent of the organ being simulated. Therefore to customise it for a particular instrument, the user has to supply a configuration file containing all necessary information about the number of departments, the speaking stops, couplers, pistons, tremulants, etc. Although this file is quite detailed, it is composed using an ordinary text editor according to a number of simple rules. User comments can be freely interspersed. The control program reads this file each time it starts and configures itself as required, in rather the same way that a compiler for a high level computing language reads the source code generated by a programmer. By this means it is unnecessary to write code specific to the organ being simulated, nor is it necessary to recompile the control program itself if changes to the stop list, for example, are made. An extract from a configuration file, with further audio clips, is shown in another article on this website (read).
The size of the executable program for the organ specified above is about 48 KB including all of the GUI editing routines, and the SoundFonts together occupy under 35 MB; these figures are tiny by current standards and there is obviously much scope for expansion. More importantly, any modern PC running this program is idle most of the time because the hard work is done by the hardware sound engines in the sound cards. All the control program is doing is shuffling internal MIDI information around and passing it to the card drivers via the Windows MCI. In fact the PC has to be slowed down using wait states when polling via the external ports, otherwise the external interfaces to the console switches simply cannot keep up. The use of wait states does not slow down the response of the program to MIDI demands because these are serviced instantly on an interrupt basis. Although the programs usually run on recent PC's with a clock rate over 1GHz, the system has been tested on a humble and obsolete Celeron PC running at only 360MHz and even this had plenty of spare capacity. This suggests that a realisation using a commercial embedded PC card running at around 200MHz should be satisfactory, and this would make for an attractive and economical organ sound engine.
Individuals and manufacturers are sometimes wary about tying their offerings closely to commercial computer products because of a perception that they will rapidly become obsolete. Yet in the case of the IBM PC and add-ons such as sound cards, the opposite is more likely to be the case. One of the reasons for the continuing popularity of the PC in its third decade is because of the upwards compatibility that has been maintained for many years, meaning that legacy programs and peripherals will still work, by and large, on modern machines. For example, much of today's Windows MCI was developed in the days of Windows 3.1 many years ago and compatibility has been maintained. This makes the programmer's life unexpectedly easy. One of many examples is the data structure required to handle the data formats of WAV audio files: in the days of Windows 3.1 it was defined as a structure of type WAVEFORMAT. This was later extended to become WAVEFORMATEX to cope with the different types of non-PCM encoding which were used increasingly during the 1990's, and more recently it has been extended again as WAVEFORMATEXTENSIBLE to cater for the multiple channels used in surround sound. But the point is that all these are upwards compatible, so that all earlier structures are still legal subsets of the later ones (at least for C-based compilers).
Even if software updates are required for drivers etc, these can usually be downloaded from the Internet. In order to keep their global customer base happy, manufacturers work hard to make dramatic changes in the technology transparent to the users, as well as the lesser ones. An example occurred in recent years with the transition from Windows 98 to 2000 and XP. The architecture of these new operating systems is entirely different to the older ones as far as drivers are concerned, and as it happens the change impacted directly on the Audigy series of sound cards. The original Audigy was issued with Windows 9x drivers only and it would not therefore work under Windows 2000 and XP. But new versions of the drivers were soon available for free download from the manufacturer's website. I routinely run Audigy 1 and Audigy 2 cards under Windows XP, even though the earlier ones were obtained before XP appeared. And in any case, the Audigy has been the subject of discussion in this article merely to demonstrate the issues. Its use is by no means mandatory, and there is a plethora of other SoundFont compatible cards which would doubtless function at least as well.
Concerns over obsolescence should be reserved for the more specialist systems used in digital organs. The fact that they are sourced from a small number of small firms is not a robust situation as far as obsolescence is concerned, nor (dare I say) for the long term health of those businesses which rely on them.
An alternative approach to the use of the rendering engines in sound cards is to use software synthesis, the essentials of which have been outlined already. There are several 'virtual pipe organ' simulations available, some of which exist either as freeware or shareware, and all of them appear to offer what we want at first sight. Some seem to be software synths written by amateurs, and this word is not used in a pejorative sense because some of the authors are obviously highly competent. When this article first appeared, most also seemed to have disadvantages - for example, they only had a rudimentary modulation/articulation capability, so that even so basic a thing as getting a good tremulant appeared to be difficult. However, with the passage of time the severity of these problems has receded as computers have become more powerful. In any case, the corresponding upside is that, if you write your own software synth, you can customise it to do exactly the job you want provided the host machine on which it runs is sufficiently powerful. With the latest PC's this is not a major problem in many cases. There is no hard a priori limit on polyphony for example, because for a given host computer this depends largely on the size of organ being simulated.
Writing a basic software synth in a language such as C is not particularly difficult, and computer music students are often set the task as part of their coursework. However, incorporating the elaborations necessary for a complete digital organ puts it into quite another ball park and it then becomes a job for which more experience is necessary. Achieving a satisfactory tremulant is but one mundane example, where the frequency and amplitude of the waveforms being generated have to be modulated (not necessarily sinusoidally) in real time and by amounts depending on the position of a note in the compass. These effects will also be different for each stop. Getting such details right is particularly important for the theatre organ. Thus my personal view is that a much better software synthesiser will result if its author is also an organist.
Contrast this with the sound card approach, where all these tiresome minutiae are taken care of by the sound engine once you have set them up in a SoundFont. As a general rule, anything which requires the articulation of a waveform to be varied while a note is sounding makes life difficult for a software synth. Things are much easier if all the waveforms are pre-computed, stored and then simply called up as required. This implies that the range of modulation possibilities routinely available from a hardware rendering engine are less frequently offered by a software approach. Although a digital organ can be built this way, a danger can be that it will be a rather lifeless one. For this reason software synths tend to use very large stored samples, often tens of seconds or so in duration, so that the intrinsic variability of the waveforms over this period will add more life to the sounds. This makes them very hungry for memory, although with current PC's this is not a particular problem.
Despite its shortcomings the potential advantages of software synthesis are considerable, particularly regarding polyphony and the number of output channels. Therefore the following outlines my approach to writing a bespoke synthesiser optimised for organ applications.
If only for convenience I prefer to use a sound card with multiple audio channels as the output device for a software synth, and again the Audigy seems as good as any. All of the analog outputs on the Audigy 2 are derived from extremely high quality DAC's, which is not the case for some earlier cards (including the Audigy 1) and for some current competitors. In these cases, different types of codec are used for the various outputs and consequently their signal to noise ratios and distortion figures vary somewhat, though for many applications the variations will be imperceptible in practice. Digital audio outputs are also available. Using a software synth it is possible to direct the output from any stop on the organ to any of the loudspeakers, and with up to 8 outputs per card depending on the brand (usually organised as 4 stereo pairs) this is a significant advantage. Some cards allow you to use the headphone outputs independently, and this will provide an additional two channels. As with MIDI I/O, the Windows MCI functions give you easy programming access to all of the wave output channels on a sound card. For example, calling waveOutGetDevCaps returns the necessary information to a C program provided the card driver conforms to the WDM standard. Recent extensions to the MCI allow separate access to the individual channels of any stereo pair to give the flexibility required for 7.1 surround sound, and for organ work this enables all the channels to be used as independent loudspeaker outputs. Another advantage of using a card such as the Audigy for sound output is that it is also possible to use the hardware sound engine in the card in tandem with software synthesis if desired.
I also use SoundFont technology to develop the waveform ensembles for software synthesis, together with their articulation (modulation) parameters. Because a SoundFont is not engine-specific, it is just as applicable to a software method of sound generation as it is to the hardware in a sound card. Doing things this way also means you can use the same flexible and attractive range of commercial SoundFont editors for voicing as you can with a hardware approach. The resulting edited font is then loaded into the software synth. If looping a waveform is required I have developed algorithms which select the optimum waveform lengths and the loop points automatically, thereby saving hours of manual labour with the editor.
My preferred realisation of a software synth operates on the basis of at least one waveform per note per stop, therefore there are at least as many waveforms in the ensemble as pipes in the equivalent pipe organ. Multiple waveforms per note would be used in additive synthesis for example, or when simulating the separate ranks of a mixture independently. It is much easier to implement a software synth if it is not required routinely to perform operations such as real time interpolation on the fly. Such operations would be required if the note currently demanded was not represented in the waveform ensemble, and it would then be necessary to shift the frequency of the waveform nearest to the note using interpolation. Even for a fast PC these processes are very demanding for a real time system and they should be minimised, otherwise the latency of the synth (the time delay between keying a note and hearing the sound) can easily escalate to a perceptible level. Nevertheless, in any half-decent synth it is not possible to do away with processes such as real time frequency and amplitude modulation entirely, as they are required for tremulants for example. They are also required when simulating more subtle effects such as wind sag. Therefore it is all the more important that they should only be used when absolutely necessary so that maximum processor power is available to implement them. With a combination of at least one stored paradigm waveform for each note of each stop, plus the ability to impose real time frequency and amplitude modulation when necessary but not as a matter of course, it is possible to simulate every conceivable nuance of pipe organ speech both for stops individually and in chorus.
Thus when a SoundFont is loaded into my synths, it is examined by the loader to see if the above criterion of one sample per note per stop has been met. For example, when setting up the SoundFonts for an instrument using an editor, stretched keygroups might have been used in which only one waveform was used across a group of notes in some cases. In this event, the synth loader module will automatically generate a set of new waveforms to fill in the gaps so to speak. These will have their note frequencies interpolated automatically to correspond to their positions in the compass, but the articulation parameters will remain the same as those applied to the rest of the keygroup (though scaled for pitch if necessary). Loop points for the new waveforms will also be defined automatically when necessary. The note frequencies can of course be selected to represent any desired temperament. You will note in passing that operations such as these mean that you need to have a detailed understanding of the structure of SoundFont files and some facility in manipulating them. A topic of interest to those who worry about commercial secrecy is that the finished sound file containing the data for your software synth can have any desired 'private' format, or it can even be encrypted if desired - it does not have to conform to the SoundFont specification as it does with a standard sound card.
The host system for a software synth has to be nothing other than a very fast PC with a lot of installed memory. It just is not possible to get away with anything less other than for the smallest organ simulations. It is not easy to predict exactly what the parameters of the machine should be in a given situation because there is no hard definition of a polyphony limit, for example. Unlike a hardware synthesiser, the limitations of a software synth depend strongly on many factors including the power of the host computer and the size of the organ being simulated. However it soon becomes obvious if the host machine is not up to the job. In this respect a software synth is no different to those using hardware rendering engines - both have a finite simulation capacity. The demands placed on the system are generally greater for a theatre organ simulation than for a classical one because of the problem of getting the tremulants right, together with some aspects of tonal percussions, traps and effects. A realistic sustain pedal for the piano is quite difficult, for example. All of these require a real time articulation capability, which is particularly greedy for computing resources as we have seen already.
I hope this article has demonstrated at least the following:
But none of the above can be realised if a corresponding capability does not exist for voicing the resulting instrument! That's quite another story .
Some of the techniques described in this article have been embodied in a digital organ system called Prog Organ, and many sound files are available on this site.