Voicing Electronic Organs

VOICING ELECTRONIC ORGANS

by Colin Pykett

Posted: July 2003

Last revised: 23 December 2009

Abstract The subject of how today's digital electronic organs are voiced is many-facetted. However there is one factor usually paraded as a benefit which in fact is a decided hindrance. This factor is the complete lack of any intrinsic organ-like character in the hardware and software environment of a digital organ, unlike a pipe organ which can only ever sound like what it is. By contrast, an electronic organ could just as readily emit the sound of a barking dog as an organ pipe. Therefore this virtually unbounded flexibility demands a detailed understanding of the speaking characteristics of organ pipes before the organ can be voiced effectively. Consequently the majority of this article is concerned with this topic. It describes the types of digital organ available, how they work and the pros and cons of the various systems. It then covers matters such as how recordings of organ pipes can be made, how these may require pre-processing before being incorporated into an instrument and how sounds can be created without the need to first make recordings. Spectrum analysis is discussed in detail because its capabilities and limitations are widely misunderstood, particularly when applied to attack and release transients.

CONTENTS

(click on the desired section to access it)

INTRODUCTION

THE COMMERCIAL SITUATION

TIME DOMAIN SYNTHESIS TECHNIQUES

FREQUENCY DOMAIN SYNTHESIS TECHNIQUES

FURTHER SYNTHESIS TOPICS

- ADSR

- Initial Attenuation

- Polyphony

- Multitimbrality

- Real Time Operation

TIME VERSUS FREQUENCY DOMAIN SYNTHESIS - pros and cons

VOICING

- Making Recordings of Organ Stops

- Pre-processing Recorded Waveforms

APPENDIX 1 - Digitising Pipe Organ Waveforms

APPENDIX 2 - Additive Synthesis

NOTES & REFERENCES

INTRODUCTION

The processes used in voicing digital electronic organs are not well known and consequently there is much ill informed buzz and speculation about the subject, as evidenced from Internet chat forums for example. The reasons for the situation include the lack of information available from manufacturers, and an apparent misunderstanding of topics such as spectrum analysis and the physics of organ pipes even on the part of some of those in the electronic music business.

This article surveys the types of digital organ available and then moves on to how they are voiced. The most important feature is a detailed understanding of the nature of organ pipe sounds, which is far more significant than the type of digital system used to simulate them. Without such knowledge no system, no matter how sophisticated, will achieve success. This is because any digital organ, unlike its pipe counterpart, is not intrinsically organ-like in any way. Until it is programmed in minute detail to emit the sounds of organ pipes it could just as easily become a digital dog-barking machine. Therefore this article considers in detail the processes necessary to identify and extract the essential features from the sounds of real pipes, including their transient behaviour.

The material in this article requires no more than a high school understanding of science, together with a similar level of computer literacy. Some musical understanding and experience of the pipe organ is also assumed. Appendix 1 outlines the essentials of how digital signals are generated and handled by computers for those who are unfamiliar with the subject. Appendix 2 covers additive synthesis, a topic which occurs repeatedly throughout the article. However in both cases the Appendices explain a number of detailed technical points, so they are recommended reading even for those with greater experience. Spectrum analysis is covered in considerable detail in the article because not only is it pivotal to an understanding of pipe sounds, but it is a subject widely misunderstood.

Some of the more detailed aspects can be omitted without a significant loss of comprehension. These are identified by a smaller typeface. However this does not imply lesser importance, and anyone seriously into the subject needs to be familiar with them.

THE COMMERCIAL SITUATION

Digital electronic organs came in two varieties when this article was first posted in 2003, though a third (physical modelling) made a later appearance in about 2006 in the form of a MIDI expander box made by Viscount. In 2009 this company introduced a complete organ using physical modelling technology. Frequently the first two varieties are called sampling and real time synthesis instruments respectively. An example of the former is the Allen organ and of the latter, the Bradford Computing Organ (and those which use derivatives of the Bradford technology). However this nomenclature is unfortunate and misleading in one respect : the term “real time” is used widely in the field of digital signal processing to describe a system which is capable of responding to or reproducing events at the rate at which they occur in the real world. Since both types of digital organ and indeed any musical instrument that was ever made necessarily must have this attribute, it cannot be appropriated legitimately to describe just one variety. In any case, it will be pointed out later that at least one type of “real time” organ does not operate strictly in real time at all. Moreover, a defect of some digital organs in the past which still persists sometimes today is that they cannot keep up with the demands of the player in some circumstances - and this was true particularly of "real time" organs. This inexcusable attribute makes the use of the term “real time” ridiculous. The use of the term to denote a particular kind of electronic organ is usually only for sales and marketing purposes.

The terms time domain and frequency domain synthesis, which are also the property of the wider signal processing community, are much better and we shall use them in this article. Samplers are examples of time domain synthesisers, and the so-called real time organs to be described use frequency domain methods. But what are these methods in plain language? To understand them better it is helpful to take a trip into history.

In the beginning, starting around the 1920’s, all electronic organs used analogue electronic circuitry. But in 1971 there appeared a patent [1] which was materially associated with the development and subsequent marketing of the Allen digital organ. Although it was by no means unique among a plethora of similar patents both before and since, it was in many ways a remarkable document and even today it repays careful study. Among other things it described a “generator” which could be assigned to any key and which would read out from a memory one or more stored waveforms, at a rate related to the key in question. When the key was released the generator would be released also, after the waveform had decayed gradually. All of this was done years before the microprocessor had appeared in general use, and before memory chips were widely available. In fact the whole thing was a vast hard wired arrangement of simple binary logic elements and the memory was contrived from diode arrays. In practice the system was realised using a set of novel custom integrated circuits designed by the North American Rockwell Corporation which filed the patent. This story is interesting not only for its technical content, but because it portrays the evolution of virtually every digital organ since – a new technique protected by patents filed by a specialist firm or other developer, the subsequent development of custom integrated circuits of high complexity (and, later, software as well), and the dissemination of these solely to one or more organ manufacturers under tight commercial licensing conditions.

A few years later another key patent was filed in the UK in 1976 by the National Research and Development Corporation, a long-defunct invention of the interventionist government of the day, acting for Bradford University. After a curiously long wait this patent was published in 1980 [2]. It described an alternative system in which the only stored waveform was a sine wave (a pure tone), or even just part (one quadrant) of a sine wave, together with lists of numbers describing the harmonic structures of the tones to be simulated. By then cheap microprocessors had become available, considerably simplifying the design problems, although they were not capable enough to undertake many of the real time computations required. Consequently this first Bradford Computing Organ design also contained esoteric electronic chips such as hardware binary multipliers. This system has been used by a number of organ manufacturers and so we see again the same evolutionary pattern as before – a novel design licensed to various organ firms, most of which had played little or no part in the initial development (although in the case of the Bradford system a few did).

It is useful to take stock of the historical context in which these two patents arose. A large part of the earlier one had nothing to do with the storage and generation of sounds at all, rather it described methods of key scanning and multiplexing which today are taken for granted. But the point is that when the patent was conferred these techniques (obvious though they were, even at the time) were also then the subject of legal protection. Thus during the 1970's and 80's other organ manufacturers were prevented, in theory, from even using these relatively trivial techniques let alone the novel method of sound generation itself. One consequence was that virtually all other firms continued to build analogue organs in this era. Although a minority of these was very good (but expensive), the rest were cheap and awful. In the midst of this situation the Bradford patent emerged, because it was able to sidestep the problem of infringement by describing a method of sound generation using frequency domain techniques rather than time domain ones. In the 1980's a few firms in Europe began to use the Bradford technology, including Ahlborn and Wyvern. By the mid-1980's the Musicom firm had also appeared which developed a derivative of the Bradford technology, and one of the first commercial users of this system was Copeman Hart in the UK. A few years later the situation relaxed still further with the expiry of the early time domain patents. Then firms worldwide were able to use time domain methods freely, and there was a flood of electronic organs in the early 1990's using direct storage of waveforms. The analogue organ market collapsed equally rapidly at that time.

Perhaps the most important message to take away from this story is that frequency domain synthesis arose during a period when the protection afforded to time domain synthesis was tight and vigorously defended. Because the Bradford work was funded by the British taxpayer it is reasonable to assume that it was driven by a desire to sidestep the time domain legal straitjacket and open the door wider to UK industry. Because of this background, some of the claims one still hears about the alleged superiority of frequency domain methods do not always have a sound factual basis.

The situation depicted above has been repeated several times, particularly after the original patents expired. Both of the systems mentioned have been the subject of continuous development and in addition others have appeared. Today, various specialist firms source the hardware and software necessary for the systems, and these are usually updated regularly. It is rare for the electronic organ manufacturer himself to assemble his products from scratch as he used to do in the analogue days. But from a business perspective, the fact that some manufacturers have made the decision to procure their enabling technology from a single source might imply a strategy with some risk. This could have unfortunate implications not only for themselves but for their customers in the long term.

So we have seen that the Bradford system began to be used by a few manufacturers in the 1980’s, and shortly afterwards the technology also became available in an alternative form from Musicom. Some firms, such as Wyvern, apparently continued with the Bradford system, though their implementation as described in 1999 [3] was somewhat dated by current standards. The systems are modular in that they can be expanded to suit various sizes of organ, measured in terms of keyboards and stops. Depending partly on the size of the organ and partly on how much flexibility the customer is prepared to pay for, these organs incorporate a number of “music modules” (the name might vary depending on the manufacturer), each of which can provide typically 64 independent note generators. A single module of this size could only cope with a single department of a few stops without running the risk of missing notes, another undesirable phenomenon exhibited by the cheaper organs of either type.

The Allen digital organ, and it is probably safe to say most others, continues to use time domain methods of sound production. This technique also characterises most if not all pop music instruments such as synthesisers and computer sound cards. It therefore has an enormous user base world wide, with correspondingly wide acceptance and a large amount of pooled experience. The pros and cons of the systems continue to excite debate, not always conducted objectively or with sufficient insight, among manufacturers, dealers, performers, owners and others with an interest.

The third method of sound production in use today was mentioned above. Called physical modelling, it is based on a physical analysis of how musical instruments actually work, and the conversion of this understanding into mathematical equations and computer algorithms. Computers today are so powerful and fast that these equations can be solved in real time to generate the sound that the instrument itself would emit. The method has its roots in research begun several decades ago, and the clarinet was one of the first to be fully characterised in this manner. An article elsewhere on this site discusses physical modelling in detail [14], so it is not described further in this already rather long article.

A problem besetting the subject is the paucity of information and documentation available in the public domain from electronic organ manufacturers and those who supply them with their enabling technology. One can surmise why this is so, but unfortunately it does not serve the interests of customers and it also allows an unhelpful surfeit of misinformation and misunderstanding to flourish. This can be observed on a daily basis merely by dipping into various Internet discussion forums, or from the letters columns in the organ literature where correspondents frequently plead for clarification. Fortunately, even digital electronic organs cannot transcend the capabilities of computer science and engineering or the laws of physics, regardless of claim and counter-claim and strenuous efforts by the ad-men. Therefore by examining the subject at this level we can identify their strengths and weaknesses and the areas which are most important in terms of voicing, regardless of the firms or technologies involved. It is for this reason also that terms such as time or frequency domain synthesis are used in this article, to avoid confusing matters of fact and principle with particular commercial offerings.

TIME DOMAIN TECHNIQUES

We have just seen that commercial music systems for the mass consumer market such as synthesisers use time domain methods, and this is also true of many if not the majority of digital electronic organs. It is also true for PC-based systems which often call themselves virtual pipe organs. The characteristic feature of time domain synthesis is the direct storage of waveforms, or time series as they are more rigorously termed, as sequences of binary numbers. Thus the digital organ described in [1] was a time domain system because it relied on the storage of waveforms in memory, and these were read out and supplied to the loudspeakers when keys were pressed. Appendix 1 outlines the essentials of digital signal representation.

In the simplest time domain organ system a number of waveforms, called samples, needs to be stored for each stop. In many systems each waveform represents the sound of the stop over a region of the keyboard often termed a keygroup. At least four waveforms and keygroups would typically be required to reasonably simulate a stop across a range of five octaves, and at the other extreme each note has its own sample. Multiple samples are necessary to enable the variations in tone quality and regulation of a real rank of pipes to be properly simulated across the compass. Each sample might either have been generated synthetically by a computer program, or it can be the result of a high quality recording session in which the actual sounds of the stop were captured. In practice a combination of both techniques is often used, in which the raw recordings are cleaned up or otherwise modified by a computer program before being stored as samples. These matters will receive more attention later.

Both the transient and steady state phases of a waveform can be represented in each sample, so that when a key is pressed the transient part of the sample is heard first followed by the steady state. To reduce the amount of storage needed, the steady state portion of the sample then usually loops repeatedly as long as the key is pressed. When the key is released, looping continues for a short time while the amplitude of the wave is digitally attenuated to zero. This enables a more realistic gradual cutoff of the sound to be achieved. Alternatively, releasing the key results in immediate cessation of looping, with the final part of the waveform then being read out just once before the sound ceases. This last portion of the sample might contain a characteristic release transient for the stop in question. Figure 1 shows a hypothetical sample containing attack transient, steady state and release transient phases together with the looping points. This sketch is purely illustrative and usually the sample will usually contain many more cycles of the waveform than indicated in the diagram.

Figure 2 depicts the keygroups that might be used for a stop with five samples across a five octave keyboard.

Each sample has to be individually adjusted for amplitude and attack/release characteristics so that audible breaks when moving between keygroups are at least unobjectionable, and desirably undetectable. Other parameters such as low pass filter settings and attenuation also have to be adjusted. The loop points for each sample also have to be chosen so there is no audible evidence of the looping process, such as repetitive clicks or pops. With some waveforms this is extremely difficult to achieve and it requires considerable patience and experience to get perfect looping. It can be seen that a sophisticated sample editing program is required to enable all these “voicing” functions to be achieved. At one time these were closely guarded items of specially-developed software, but today they can easily be purchased at minimal cost or even downloaded free of charge from the Internet. They may even form part of the software bundle with a computer sound card. However, whether a particular editing program can be used with a particular make of digital organ is quite another matter. Most commercial organs still need their own customised editing programs for voicing purposes, and manufacturers may be unwilling to release them, or they might require a binding non-disclosure agreement to be entered into.

FREQUENCY DOMAIN TECHNIQUES

Frequency domain synthesis techniques are less common than time domain ones in the commercial music world, at least at the mass consumer level, and the only ones to be described are those which use a form of additive synthesis. These techniques are classified as frequency domain ones in this article because the stored values relate to frequency artefacts (spectra) rather than those based on time (waveforms). Appendix 2 summarises the process of additive synthesis.

We have just seen that in time domain synthesis the memory of the system is occupied by samples of the actual waveforms we want to hear. With frequency domain synthesis the only stored waveform as such is typically one cycle of a sine wave, or even just one quadrant of a sine wave [5]. This is because the only waveform necessary in additive synthesis is a pure sine wave at each harmonic frequency. For this reason the Bradford patent [2] did not infringe those such as [1] protecting the Allen organ. But additive synthesis relies on adding various harmonics, so it is also necessary to store tables of numbers representing the strengths of the harmonics making up the spectrum of an organ stop. Thus for a diapason, typically 15 to 30 numbers would be needed for each spectrum. However, as with the time domain system, each stop requires several spectra to enable the variation in tone quality across the keyboard to be captured.

Figure 3 shows a conventional representation of an amplitude spectrum. Each vertical line represents a harmonic and its height is the amplitude or strength of that harmonic. The harmonic amplitudes in this example are in dB (decibels), and although this representation is standard practice it can give rise to confusion. The use of the term “amplitude” is important here as it implies that an increase of 6 dB means a doubling of amplitude. An alternative representation uses harmonic intensities or powers (again the words are important as they have an exact meaning), which are the square of amplitude. In this case a doubling of intensity means a change of 3 dB. More explanation of decibel notation is available in [6]. Therefore you have to look closely when decibels are used to find out which type of spectrum – amplitude or power – is implied, otherwise major errors can ensue in the subsequent synthesis. Sometimes the term SPL (Sound Pressure Level) appears on the dB axis and this denotes an amplitude measure, not intensity or power.

Some further remarks about nomenclature are appropriate at this point. People often speak of a spectrum as “the FFT” of a waveform, or of “an FFT program” that will generate a spectrum. “FFT” means Fast Fourier Transform, and to use it in this way is misleading and non-rigorous. We can get into difficulties if we do not understand the terminology properly, particularly when dealing with waveforms such as the starting transient sounds of organ pipes, a subject we shall come onto in due course.

The reason why Fourier’s name crops up is because he was the French mathematician whose remarkable insights some 200 years ago led to today’s understanding that all periodic sounds can be expressed as a sum of pure tones, constituting the fundamental and upper harmonics of the sound. The adjective periodic is important here because it means not only that each repetition cycle of the waveform is identical, but that the waveform lasts for many such cycles (the mathematics actually requires an infinite number). What is called the complex spectrum of such a waveform (not the same as Figure 3) is then given by the Discrete Fourier Transform (DFT) of the waveform, assuming we are dealing with a digitally sampled version of it in a computer. Fifty years or so ago it took far too long to compute the DFT directly because computers in those days were considered fast if they had a clock rate above 1 MHz. Therefore Cooley and Tukey published a paper describing how the process could be speeded up by using their Fast Fourier Transform algorithm. Often it matters little today whether one uses the DFT or the FFT, so fast have computers become. In fact the FFT can be slower than the straightforward DFT in some circumstances if one is not careful because of the data-shuffling and manipulation overhead it entails compared with the DFT.

The DFT and the FFT both give the same result, which is not what we conventionally speak of as the spectrum of the waveform such as that pictured in Figure 3. The output of the complex DFT or FFT contains not only the amplitudes of the harmonics but their relative phases as well. It also detects both positive and negative frequencies, which we shall not go into. Because the ear is insensitive to phase and negative frequency, we can throw this information away and only use the amplitude and positive frequency information. But to do this it is necessary to compute what is called the modulus of the complex transform. An economical way to proceed in terms of the programming, execution time and storage required is to compute the real rather than the complex transform. This allows the transform length to be only half of what it would otherwise be. Although these details cannot be explained exhaustively here, they are all important and therefore the overall process of deriving a spectrum is more complicated than simply computing the FFT alone.

What’s in a name, I hear you say? Well, incorrect usage reveals vague understanding. In digital organs the vagueness can lead to inadequate synthesis of starting transients and inefficient additive synthesis to name but two problems, and both are addressed later on. A further inefficiency, if not in the organs themselves then in the waveform editing processes used to voice them, is related to the apparently widely held belief that the FFT always has to operate on a data length (number of samples) which is a power of 2: 256, 512, 1024, etc. Such an FFT algorithm is called a radix 2 transform. This is a curious hangover from the original implementation of the technique which did indeed recommend powers of 2. It is a needless restriction however, and the FFT can use any radix. Even today one of the most efficient algorithms available was developed at the Stanford Research Institute over 30 years ago; this used a mixed radix transform, thereby allowing virtually any data length to be used. It also employed a particularly elegant and rapid method of doing the mathematics which avoided the need to repeatedly calculate or look up trigonometric functions. Moreover, some particularly fast implementations of the FFT have used a prime radix transform in which the data lengths are prime numbers. Since most, if not all, modern waveform editors still impose the radix 2 restriction it is evident that developments such as these are not well known to the commercial electronic music community.

In frequency domain synthesis each of the individual spectra making up one stop can be assigned to a different keygroup, a range of notes, though in the commercial systems we have mentioned each is usually assigned to a single note which is sometimes referred to as a voicing point. The computer system then usually derives an interpolated spectrum for every other note from the two nearest voicing points. Interpolation is a mathematical technique but it really means blending or (to use an image processing term) morphing, the production of something new from several constituents. If a note keyed corresponds precisely to a voicing point then only that spectrum is used further in synthesising the output. If the note lies between two voicing points, an interpolated (blended) spectrum is calculated in which each harmonic has an amplitude which is some arithmetical function of the same harmonic in the two spectra at the nearest voicing points. In some systems the mathematical form of the interpolation function can itself be specified. Interpolation disguises sudden audible discontinuities between voicing points, although the operation of the interpolator can sometimes be detected by playing chromatic scales up and down the keyboard – one can hear the changes in tone quality between the voicing points, and indeed often find out where they are. This illustrates at once the blatant crudeness of such instruments compared to real pipe organs, where such phenomena are of course absent.

The spectra themselves cannot be used directly for generating sound – for this they need to be converted to waveforms. This is done in the Bradford system, for example, in two stages. The spectra for each note of a stop are first calculated by interpolation as described above when the stop is drawn. Then a corresponding waveform for each note, sometimes just a single cycle in the early systems, is derived by additive synthesis from each spectrum and placed in a temporary note memory. These processes occur whether keys are currently pressed or not. The system then waits until a key is pressed, at which point the waveform for that note and that stop is read out from the note memory just as in a time domain system. This method of doing additive synthesis gets over the problem of doing it in real time as the music is played, which would require an almost impossibly fast computational capability. Therefore some organ systems colloquially called “real time” are in fact anything but! This is another reason for referring to them here as frequency domain systems. In some systems the synthesised waveforms can be updated rapidly as the music is played to simulate effects such as “live winding”, in which the speech of pipes in a real organ is affected by the dynamics of the winding system. This is discussed at more length later on. It is a feature made much of by some manufacturers and pundits, but in fact it uses only basic computer techniques. It is no different in principle to the monitor on the humblest PC whose picture content can be changed rapidly in response to the demands of a computer game, say.

In theory, a more efficient way to perform additive synthesis is to compute the Inverse Fourier Transform (IFT) using an FFT algorithm. Just as the forward Fourier Transform converts time domain data into the frequency domain, the inverse transform converts frequency data into the time domain. Therefore it can be used to generate a waveform from a set of harmonic amplitudes, i.e. from a spectrum. As with the forward transform, additional efficiency is gained by ignoring the phase of each harmonic. However if the transform length is small, care has to be exercised to ensure that the FFT is actually faster than the "brute force" method described above.

It might be wondered why the frequency domain system is used, because it appears at first sight to be more complicated than the time domain sampler technique. When it was first developed one answer doubtless related to the patent situation. Nevertheless it is a reasonable question to ponder on, and the advantages and disadvantages of each method will be considered later. .

FURTHER SYNTHESIS TOPICS

ADSR

We now leave this discussion of synthesis methods and turn to some related topics which apply to all of them. The term ADSR occurs repeatedly in electronic music, and it is necessary to understand what it means. It is an expression which arose in the early days of monophonic analogue synthesisers and it stands for Attack, Decay, Sustain and Release.

In Figure 4 is sketched an ADSR curve, which is the amplitude-versus-time envelope of the sound of a single note of a single stop (although in the case of frequency domain systems it can also relate to a single harmonic of a sound). When a note is keyed the amplitude increases from zero over a time denoted by the Attack parameter. The maximum value reached during this phase is sometimes called the Initial Attenuation level. Then the amplitude might fall to some lower level over a second time period defined by the Decay value. After this the amplitude remains constant, at a level defined by Sustain, for the time that the key is held down. Finally when the key is released the sound drops to zero over the Release period. The Attack, Sustain and Release phases correspond exactly to those in Figure 1 which illustrated how sound is generated from a finite wave sample.

Note that the Attack, Decay and Release parameters are all expressed as time intervals (typically milliseconds), whereas the Sustain parameter is an amplitude level (typically expressed in dB). In organ work the Decay phase is more often than not irrelevant as far as the overall sound is concerned, because once the Attack phase is over we do not usually want the amplitude to reduce again before the steady state commences. Such an overshoot can produce a most peculiar effect if overdone. However it can have importance if a separate ADSR characteristic is applied to each harmonic of the sound in frequency domain synthesis. For example, the second harmonic of a diapason pipe often rises more rapidly than the fundamental to a higher peak value, before dropping back to a steady state (Sustain) level.

Do not confuse the Decay and Release parts of the curve; the ending of a sound when the key is released is governed by the Release value, not the Decay one. Some odd effects can be produced if the wrong value is adjusted inadvertently when voicing or regulating a stop!

Initial Attenuation

This term does not enjoy a universally acknowledged definition, although it is used in all synthesis methods in some way or other even though its name may vary. It governs the overall attenuation applied to each waveform and is quite independent of the other parameters in the ADSR characteristic. It is one of the most useful parameters available in a voicing program for getting the regulation of a stop correct over the compass, that is, the way its loudness varies across the keyboard. In turn the quality of regulation of a stop materially influences its blending characteristics with the others.

When a sample is loaded into a time domain digital organ it will always be replayed at some fixed maximum level (referred to here as an Initial Attenuation of 0 dB) unless that level is changed. This assumes there is no Decay phase, so that the Initial Attenuation and Sustain levels in Figure 4 are one and the same. This will be appropriate for the vast majority of organ samples. Naturally, the level of the sample corresponding to one keygroup might not be appropriate for the adjacent ones, so to prevent an audible discontinuity appearing the Initial Attenuation parameter for the corresponding samples has to be adjusted. It is given some other value such as – 6 dB, meaning that an attenuation by a factor of 2 would be applied to that keygroup, although in practice the correct values have to be found by trial and error listening tests.

Initial Attenuation is also important for adjusting the relative loudness of the various stops. If this were not done a Dulciana would sound as loud as a Trombone, and to prevent this the Dulciana might need to be attenuated by 35 dB or more relative to the Trombone. The computer in an organ has no way of divining what we want it to do unless we tell it explicitly!

Although this discussion referred to time domain systems it also affects frequency domain ones. At the point when sounds are emitted from the organ, both systems are replaying waveforms stored in computer memory. For many purposes it does not matter how they got there.

Polyphony

It is necessary to say a few words about how digital organs use the stored waveforms or harmonic tables to produce sound when several notes are keyed at once. One of the most important issues concerns polyphony, which is the ability of an organ to produce many notes simultaneously as opposed to the monophonic operation which characterised the earliest synthesisers.

The following brief discussion is included to assist an understanding of why polyphony is important, and why unlimited polyphony as in the pipe organ can be difficult to achieve in electronic ones. It should be appreciated that the hardware and software details mentioned are generic, rather than relating to any particular make of digital organ.

All organs can be considered to use generators to actually produce the sounds, although these are very different from the tone generators used in analogue organs. Moreover, the actual term used might vary depending on the manufacturer’s preferences. Each generator can be envisaged as a flexible digital circuit arrangement, or a very fast software module, which can accept any waveform for any stop and output it to any amplifier and loudspeaker. When a key is pressed the computer determines which waveform has to be used for a particular stop and loads it in some way into a free (currently unused) generator, although usually the waveform would not be physically relocated within the computer’s memory. Instead the generator would be loaded with the address of the waveform where it resides in memory. However this is a detail which is not of particular interest at present. Also one need not assume that each generator necessarily handles the complete waveform for a sound. Sometimes only specific harmonics are loaded into a generator, and these are then combined afterwards to produce the composite sound. Such techniques are associated mainly with frequency domain (additive) synthesis.

The computer also loads quite a lot of additional information into the generator relating to the ADSR envelope, Initial Attenuation, subtractive filter parameters (if relevant), looping parameters (if relevant), the output channel to be used, etc before switching the generator on. The note then sounds until the key is released. If another key was to be pressed while the earlier note was still sounding it is easy to see that another generator has to be available. In fact if it is decided that up to 8 notes, say, need to be catered for simultaneously on each department of the organ then there need to be at least 8 generators per department. But this is an oversimplified view of things, and we need to consider how to handle the use of multiple stops simultaneously as well as multiple notes.

Multitimbrality

An organ has to cater not only for polyphonic operation on a single stop but on multiple stops simultaneously. It would be a poor organ on which one could only play one stop at a time. In electronic music parlance this leads to the concept of multitimbrality, a monstrous word no doubt intended to impress the gullible. However it merely means the ability of an instrument to sound several stops of different timbres at once.

In an organ of any pretension it is essential to be able to introduce the subtle variations in tuning which characterise a pipe organ. If a single note is played on a pipe organ with two unison stops drawn they will almost always be slightly out of tune as revealed by very slow beats. In a recently tuned organ one beat every few seconds would be typical. If this feature is not incorporated the sound of the digital organ will be thin and unconvincing, and it can only be achieved by having enough generators to assign to every key currently pressed of each stop currently in use. Thus if there were 8 stops on a particular department and a polyphonic capability of 8 notes was required, then a total of 64 generators would be required just for this small department. The generators are quite complex pieces of hardware and software which interface in an intimate manner to the computer itself, so that the program in the computer can control all of their functions. It is this intrinsic complexity which demands the use of specialised integrated circuits for digital organs, as it would be virtually impossible and definitely uneconomic to build all the generators required using ordinary components. Each “music module” of the Bradford system contains 64 generators, the same as some modern computer sound cards.

The ability to add more sound generating hardware to a digital organ at will, such as by specifying the number of “music modules” to be used, means that enough generators can be provided in theory to cater for the size of stop list under consideration. However in practice, even with the advantages conferred by integration, economies have to be made sometimes. To ensure that sufficient generators remain free while the organ is being played, some notes on some stops will either not sound at all or they will be treated in a different manner under certain circumstances. For example if a loud reed is being used at the same time as a quiet flue stop is drawn on the same manual, the quiet stop simply might be ignored by the system in the belief that the organist and his audience would not notice. Or the waveforms for both the stops might be added together before being loaded as a composite waveform into a single generator, thereby removing the “free phase” effect that we have just seen is so desirable. Some systems brutally drop notes and/or stops completely and randomly when an overload situation approaches, rather than attempting to fail gracefully in the manner described.

Similar problems arise if many couplers, particularly octave or suboctave couplers, are used while playing. A single octave coupler can double the number of generators required as soon as it is drawn. This explains why such couplers are used rather sparingly in most digital organs. Also mixtures can represent a considerable resources overhead if the individual ranks of the mixture are assigned to separate generators, to properly simulate independent ranks of pipes. For this reason mixtures may use a single composite waveform which then only requires a single generator per note played, just as with an ordinary stop. However when this is done, it is then much more difficult (often impossible) to simulate the slight independence of tuning between the ranks; the mixture will be tempered to an unrealistic degree of exactness in this case.

Real Time Operation

Obviously, all digital organs have to work in real time. Unfortunately not all of them do, and sometimes players find that a perceptible time lag arises when the music is fast, highly polyphonic, when many stops are in use, or when some or all of these factors arise at once. In the worst cases the system can grind to a halt, notes can be lost or they continue to sound indefinitely. The problems arise because the computers in the system simply cannot always cope with the demands placed on them by the player. It is fair to say that these problems are probably less frequent today than they appeared to be some years ago, but they are of course completely indefensible in an instrument of any sort. If an electronic organ is dependent on MIDI, this can introduce similar problems of its own [7].

TIME VERSUS FREQUENCY DOMAIN – PROS AND CONS

This section begins a discussion of the advantages and disadvantages of time domain and frequency domain methods of synthesising sounds. Not all conceivable systems can be considered, so the discussion is limited to those mentioned previously which either store waveforms directly or those which compute the waveforms from stored frequency information. A number of points needs to be made or repeated first to facilitate the discussion: firstly, it is useful to keep in mind that both types of organ produce the sounds from waveforms stored in memory at the time the sounds are being produced. The fact that the waveforms in one type of organ are permanently resident whereas in the other they are computed as required is sometimes of lesser importance. A second point is that, because of this difference, the amount of storage required for the time domain system can be much greater than for the frequency domain one. Each stored sample might be several seconds in length, and because there are many samples the total amount of memory might exceed that for a system which stores harmonic amplitude information. In practice this is not usually a disadvantage given the ready availability and cheapness of memory today. Moreover, the difference is not always as great as might be supposed, because the most sophisticated frequency domain systems also have to store large data tables to simulate the transient portions of the waveforms, and even more if real time fluctuations in quasi-steady-state sounds are simulated (see below). A third difference is that the computing power required in frequency domain systems is generally greater than in time domain ones, though again the difference might not be as significant as often supposed when very high quality simulation is demanded.

A major difference in capability between the two systems in principle is that time domain organs can reproduce a previously recorded pipe waveform directly whereas those using frequency domain methods cannot. In the latter case the waveforms have to be reconstructed from frequency information, and this information therefore has to be derived offline beforehand from the paradigm waveforms or in other ways. A frequency domain organ does not reproduce an exact copy of what you hear from an organ pipe because it will necessarily have been processed in some way first. In practice the difference in many cases is less important than sometimes claimed however, because making recordings of organ pipes to the necessary high standard required for direct insertion into a time domain sampler is extremely difficult. Extraneous noises are but one problem to overcome. In these cases the waveforms have to be cleaned up and then resynthesised offline, and similar processes are involved in both cases. Another major problem affecting time domain samplers is that of looping the waveform to simulate a note longer than that of the stored sample, and often it is impossible to find loop points on the raw waveform such that the ear cannot detect that looping is taking place. This can occur because of noise present in the signal, and although the noise can usually be suppressed, it is often undesirable to do this because it is the windy sound of some flue pipes which gives them much of their character. Even if loop points can be found in these cases, the repetitive "swishing" of the noise as it loops becomes offensive to the ear once it has detected its presence.

Claims are sometimes made today for digital processing techniques which profess to be able to loop "difficult" waveforms undetectably, but the plain fact is that they alter the waveform to such an extent that it can no longer be regarded as a sample of the sound as originally recorded. About the only way to surmount the looping problem without corrupting the original waveform is to use samples of such a length, in excess of ten seconds or so, that their duration is unlikely to be exceeded in normal playing. If this should occur then looping will, of course, have to be used to extend the apparent duration of the samples, but with such long samples it will only be required infrequently. Although memory is cheap today, the total memory requirement can nevertheless become problematical when hundreds or thousands of such long samples have to be stored, but it can be reduced considerably by carefully selecting the sample rate of each sample in relation to its audio bandwidth. For example, it is wasteful in terms of memory to use a 44.1 kHz sample rate for samples representing the lowest notes of a pedal Bourdon when a rate of 8 kHz or less would suffice.

Regardless of the length of the stored samples, yet another problem affecting time domain samplers concerns the termination of the sound. When the key is released it is necessary for the system to jump immediately to the release phase of the sound, and if a characteristic release transient and/or the recorded ambience of the auditorium is required it can be difficult to achieve a seamless and undetectable "join" between the previous steady state phase and release. Most systems which claim to have solved this problem in fact have shortcomings which are only detected when playing the instrument, though there are ways to achieve it.

Frequency domain synthesis also has peculiar difficulties in some situations, such as when the frequency structure of the waveform is changing rapidly. This can occur during transient phases of the waveforms. Capturing a complicated transient directly simply by recording it is sometimes the better, indeed the only, way to incorporate it in a digital organ. Many transients are so complicated that they are impossible to synthesise in any other way because of issues such as partials which begin as band limited noise rather than as discrete frequencies. Even analysing such transients can be next to impossible because of the mathematical limitations of spectrum analysis, and if you cannot analyse a transient in the first place then you have no hope of being able to re-synthesise it accurately using frequency domain techniques. Also, the addition of realistic-sounding wind noise to the tones can often be difficult with these systems. All these issues raise important details which will be expanded in later sections.

Sometimes claims are made for the frequency domain system that it is better at mimicking the subtle variations which occur while a real organ pipe is speaking. Pipes respond sensitively to their environment in myriad ways, and the onset of speech can be affected by factors such as how many other stops are drawn and how many notes are demanded simultaneously. Pipes can interact through a common wind supply and acoustically with others nearby, and there is some truth in the statement that a pipe never speaks the same way twice. Critics of the time domain system point out that to incorporate such effects it would be necessary to store an impracticably large number of samples, whereas the frequency domain system can simulate them by computing the necessary variations to the waveforms while they are sounding. In practice the truth, as usual, lies between these extremes, and time domain systems have reached a level of considerable sophistication in view of their ubiquity. It is just as easy (or difficult) to incorporate multiple time domain samples as it is to incorporate multiple tables of harmonic amplitudes or other data. More than ten years ago at least one make of time domain organ even then was able to simulate the gentle variations, including the “burbling”, of a reed stop while the key was held down for an extended period. Therefore, particularly today, the ability to include these and many other effects are not solely the province of frequency domain systems. Both systems require the storage of large amounts of data to properly simulate these subtleties, together with highly sophisticated hardware and software host environments. In terms of information theory, a time domain system is in all circumstances and in every way equivalent to a frequency domain one. Neither has an advantage over the other in this respect, and anyone who maintains it does must have a technical insight which implies they should apply for the next Nobel prize.

The characteristics attributable to unsteady winding were delightfully set out by a 19^th century description of Austin’s Universal Wind Chest system [8]:

“Why is a single stop so smooth? Because the demand for wind is so small, and every stop has been voiced and tuned under the condition of the bellows being inflated and almost in a state of rest.

To explain what otherwise happens we will suppose the bellows reservoir board to be 6ft by 6ft; this, needing a weight of 7lb per square foot, amounts to 2 ¼ cwt. It is foolish to suppose that this heavy weight can be divested of its natural law of inertia and momentum. Imagine now the bellows to be fully inflated and in a state of rest – all is right – equal pressure everywhere. Now suddenly strike full chords with every stop and coupler out and heavy pedal notes added; these heavy weights on [an] upper board constructed to rise and fall, to take in, and give out any displacement, lag behind at first, obeying the law of inertia; this results in less pressure; but when it has started in its descent it obeys the law of momentum; and it may be that this sudden demand of wind is no longer needed. This suddenly checks its downwards course, and this is the cause of momentum being added to weight. The result then will be more pressure. Also the intermittent thrusts of wind from the feeders, whose duty it is to keep up supply, have an equally disturbing influence from exactly the same cause.”

The description continues in similar vein to describe robbing caused by the conveyances and trunking but not, curiously, inadequate pallets. Perhaps this is because even Austin’s system could not dispense with these. Nevertheless it is fascinating to thus see how fashions have swung from attempts to eradicate these “defects” to today’s strenuous endeavours to simulate them in electronic organs, and to re-introduce them in some pipe instruments!

Another claim sometimes made in favour of frequency domain organs is that it is easier to voice them. In order to change the harmonic structure of a stop in these instruments, it is only necessary to alter the tables of stored harmonic amplitudes. Manufacturers’ voicing software generally facilitates this by allowing the voicer to make these adjustments in real time while a note is sounding, through the use of convenient on-screen displays containing virtual “gain controls” for each harmonic (of which there might be up to 250 or so). However it is also straightforward to implement such a scheme for time domain organs. Having altered the harmonic structure of a sound in a time domain organ, it is then necessary to do an Inverse Fourier Transform (additive synthesis) to generate the new version of the waveform. This has to be done each time the slightest adjustment to the harmonic amplitudes is made. However the speed of modern computers should make this somewhat longer process transparent to the voicer. If it does not, the voicing software and hardware being used is non-optimum and out of date.

In practice, one area where frequency domain techniques run out of steam before time domain ones is in the number of waveform recipes which can be stored. Storage of separate samples (or even multiple samples) for every note of every stop is quite feasible for the cheapest and most basic time domain synthesisers today - even the humble SoundFont based synths used in many computer sound cards can do this. However the latest Bradford frequency domain organ simulator (BEST) still has difficulty in competing in this vital area according to its inventors [13], and therefore it has to fall back on interpolating between a smaller number of voicing points as described previously. This shortcoming certainly reflected the rather dated technology on which earlier Bradford systems were hosted (thirty-year-old Z80/S80 low speed 8 bit processors are apparently still used in some commercial organ systems which use Bradford technology).

Today the two techniques appear to be moving closer to each other in that some frequency domain organs also rely on storage of waveform samples, such as BEST. The addition of wind noise to the sounds of certain stops is achieved in these cases by adding sampled noise. This is not surprising because noise contains an infinite number of Fourier components; it cannot be synthesised from a spectrum of finite size. This illustrates another disadvantage of the frequency domain system. Some starting transients may also be stored directly as waveforms rather than as extremely complex data tables.

It would be comfortable to be able to say that neither system has a definite nor incontrovertible edge over the other. Both can simulate organ tones to a high degree of realism, but both can also produce disappointing results. As with pipe organs themselves it boils down to how well they have been voiced, assuming that the hardware and software environment has the necessary degree of flexibility and sophistication to allow the voicer to flex his muscles as it were. And, as with pipes, the voicer has to know his business inside out, which is something which only comes from experience and an “ear” for the job. Remember that the frequency domain system arose at a time when the patent situation was weighted heavily against the majority of organ manufacturers. This fact, rather than spurious technical claims, was largely responsible for its genesis.

But I think it is possible to make an objective choice in favour of one system over the other, and therefore I shall conclude this section on the pros and cons of the two methods by stating my own preference. Because frequency domain ("real time") organs cannot reproduce an exact copy of the sound emitted by an organ pipe, I find them inferior to time domain ("sampling") systems in practice. It is not possible to model, and thus program, every conceivable nuance of pipe speech for the same reasons that it is impossible to achieve accurate weather forecasts. In both cases some factors are difficult to reduce to the mere numbers that computer programs demand, and we do not know about other factors at all. Also the dated hardware environments on which some frequency domain systems are hosted simply cannot compete in capability with those used by the synthesiser world at large for time domain synthesis. Therefore "real time" organs tend to produce a rather blander and smoother approximation to pipe sounds than "sampling" ones, often quite acceptable but less characterful nevertheless. The variations of the sounds from one actual pipe to another and those which occur while actual pipes are sounding tend to be captured better by "sampling" than by "real time" organs. I have not yet heard a "real time" digital organ which comes anywhere near being able to simulate the character of a Schnitger or Silbermann instrument for example. Manufacturers of these types of organ seem content with the numbing blandness of mid-20th century British and American organs. To illustrate this, the two clips which follow were produced by the "sampling" and "real time" synthesis methods respectively:

"Sampled" sounds - mp3 file (3.8 MB/4m 9s)

(Extracts from Mein junges Leben hat ein End, Sweelinck. About 1/2 minute from each movement)

"Real time" sounds - mp3 file (719 KB/46s)

(Extract from Ave Verum, Mozart)

VOICING

This section begins a discussion of some of the processes involved in voicing a digital organ. The term “voicing” is used rather loosely here to include many activities, ranging from the collection of raw sound samples through to voicing operations themselves, but it is assumed the reader will have no difficulty in understanding the meaning of the terms from the contexts in which they appear.

The voicing process requires two categories of activity. The first includes such things as making recordings of pipe waveforms, and converting them into tables of harmonic amplitudes and other data to be synthesised back into sounds. Thus spectrum analysis must be included in the range of processing functions available so that the harmonic structure of the waveforms can be derived. Much if not all of the necessary hardware and software can be obtained commercially, though a customised software suite will often be better suited to the job. The second category includes activities associated with the minutiae of loading all these data into a particular organ and adjusting them to taste within more or less narrow limits. This requires a software and hardware environment necessarily specific to a particular organ; there are many organs on the market and they are constantly evolving. Moreover, details concerning this environment can usually only be obtained from the manufacturer at his discretion. Therefore only the first category of activities can be described here. However it covers an extremely important range of topics for the reason now to be explained.

A digital organ when first switched on is nothing but a vast empty vacuum which cannot make the merest squeak on its own. It is about as much use as a pipe organ without any pipes. The vacuum has to be filled by specifying almost every conceivable characteristic of every stop, sometimes on a note-by-note basis. This statement is particularly true of frequency domain organs, which are incapable of accepting waveform samples. In these instruments an enormous amount of data have to be laboriously inserted before even a single new stop can be heard. By “new stop” is meant one which has not been used before in that organ. For example, merely to hear the steady state sound of a stop (i.e. neglecting its attack transient) the voicing points have to be defined, the harmonics for each voicing point have to be inserted, the way that the interpolator treats the variation of each harmonic between each pair of voicing points has to be specified, etc. To specify the transients in addition, it is typically necessary to define which frequencies are present in the transients, how the frequencies might change with time and the specific ADSR characteristics for each. Having done all this, the stop may then require a huge amount of additional data before aspects such as live winding can be reproduced. In other words it is necessary to input hundreds if not thousands of items of data before a frequency domain organ can emit sounds.

The time domain organ can be got going much more quickly because a set of samples for a new stop can be loaded into it very rapidly. Nevertheless, the collateral information necessary to tonally finish and regulate the instrument properly is considerable and consumes a lot of time.

Compare this situation to that for real pipes. A pipe can only emit the sound for which it was designed, more or less. If a diapason pipe is made it will only ever sound like a diapason, whose precise tone is determined by its scale, the cut-up, etc. All the voicer can do is to ensure it comes onto speech properly and, within narrow limits, emits the sound which its designer and maker intended. Obviously he could not change a diapason into a reed. Yet that is exactly what is possible within the vacuum of a digital organ, which contains nothing intrinsically organ-like at all. It could just as easily emit the sound of a barking dog or a car engine revving as an organ.

This virtually unbounded flexibility is usually paraded as a benefit but in fact it is a decided disadvantage. It is the main reason why digital organ manufacturers have been forced to copy real organ tone. Without an a priori detailed knowledge of the structure of pipe sounds they could never have conjured out of thin air the myriad parameters necessary to enable a digital organ to sound like a pipe organ. This is not to deny that with sufficient experience it is possible to create a new sound without recourse to real pipe data. But most digital organ sounds are developed on the basis of a detailed examination of pipe waveforms and, to reinforce the point again, they could never have got off the ground even in principle were it not for such examination. Therefore the derivation, analysis and understanding of real pipe sounds materially affects all digital organs and it is the subject of prime importance in voicing. The subsequent process of loading the results of the studies into a particular make of organ is a mechanical process by comparison.

In principle, therefore, there are two ways to insert a new sound into any digital organ: one can start either from real pipe sounds or the sound can be synthesised from scratch (but with a priori knowledge and experience). In practice a combination of both methods is also used, often of necessity because of problems such as unacceptable levels of noise or other forms of corruption of real pipe sounds. Noise can result from the blower, the organ action, traffic, etc. In these cases the original data have to be processed to remove the offending artefacts. In what follows we shall refer to these three methods as “real”, “synthetic” and “hybrid” voicing for brevity.

Many of the steps necessary to insert a new sound into any form of digital organ are similar. For time domain organs one would first wish to obtain a satisfactory sample of the waveform of interest, including its transient structure, using any of the three methods outlined above. In the “synthetic” (and sometimes the “hybrid”) case the waveform would typically be generated offline from a table of harmonic amplitudes using an additive synthesis (inverse Fourier Transform) program. The harmonic amplitudes would be produced by the voicer based on his intuition, judgement and experience. Then the waveform would be loaded into the organ system and parameters such as ADSR adjusted until it matched other samples relating to the same stop, and those of other stops. Note that the generation of the transient portions of the waveforms in “synthetic” or “hybrid” synthesis is much more difficult than for the steady state portions, hence the frequent reliance on the “real” voicing method in practice.

For frequency domain organs the waveform itself cannot be loaded into the system. In the “real” case a spectrum analysis of the harmonic structure of a real pipe sound would be used to populate the necessary tables of harmonic amplitudes. In the “hybrid” case these values would be modified heuristically in some way. In the “synthetic” case the amplitudes would be created from scratch as described above. In all cases, difficulties related to transient structure will often be encountered when dealing with frequency domain synthesis.

Making Recordings of Organ Stops

Recordings of real organ pipes are used in the “real” and “hybrid” methods of voicing as defined above. The advantages of using recordings of the sounds to be simulated include the ability to reproduce the starting and termination transients of the tones, which can be very complex in structure and therefore difficult or impossible to synthesise in any other way. By simply using a recording of the waveform we do not need to concern ourselves about this complexity; the transients will be reproduced automatically as part and parcel of the process. However this is only possible with time domain organs, or frequency domain ones which also allow samples to be used.

Having said this, a major disadvantage is the difficulty of making recordings of sufficiently high quality to be used directly, and most people find it is only after trying to do it that the problems become apparent. There is also an aesthetic consideration of considerable importance. Frequently one sees a firm trumpeting the fact that recordings of one famous organ or another have been incorporated into their products, yet they seem not to realise that they are alienating a substantial part of their potential market. By admitting they are merely adept copyists of first rate organ tone rather than being able to generate it from scratch, they are also admitting doubts about their inability to understand and implement the voicing techniques which are the subject of this article. Some critics go so far as to argue that producing “pirated” samples from recordings of pipe organs is a form of copyright infringement [9], which does not apply to generating them synthetically. This is why both methods are discussed here.

Anyone who has yet to try making recordings of organ pipes will probably be surprised by the difficulty of achieving a quality sufficient for direct use in a digital sampling system. The many issues involved include the following:

Where do we place the microphone(s) in relation to the pipework of interest?

If the recordings were to be made close to the pipes a lot of possibly unwanted noises can be introduced. High amplitude wind noise is generally undesirable even though you might want it to be reproduced. This is because the presence of random noise makes the selection of looping points in a time domain system almost impossible on the steady state parts of the waveform – the superimposed noise means that there is in fact no steady state as such, and the audible discontinuity in the noise will be virtually impossible to get rid of. You cannot loop a noisy waveform! In signal processing language it is only possible to loop deterministic, not stochastic, waveforms even though both may have stationary statistical properties. A very long sample can sometimes be used to ease this problem so that the time between loop repetitions is many seconds, but this enormously increases the storage requirement. Another problem associated with noisy samples is the unpleasant way the noise builds up subjectively as more stops are combined.

Action thump or other noises are also a problem when recording close to the pipes. If the starting transients are to be recorded it is often difficult in subsequent editing to remove the thump without truncating the desired transient.

Yet another problem involves subjective aspects of ambience. Do you want the digital organ to sound much as the original pipework does in the body of the building? If so, there is little point making recordings close to the pipes. Under these conditions the effect of the digital organ in a smaller room will be totally different to that of the original instrument in the nave of a church, say. But making recordings some distance away from the pipes introduces its own difficulties such as those caused by standing waves, and these are addressed later. The difference in the harmonic structure of the tones when recorded at different distances is well understood – very close to a pipe there will invariably be large numbers of harmonics discernible even for stops such as flutes. But the higher order harmonics decay rapidly in strength with increasing harmonic number, so that away down the building the higher ones become inaudible. You would not usually want these unimportant harmonics to emanate from the loudspeakers of your digital organ in a small room (in a large one such as a church the issues are different).

A definitive answer to the question of recording range cannot therefore be given and each case has to be assessed according to its circumstances. In general you might try making recordings a few metres away from the pipes; the necessary recording level settings then tend to cut down the amount of spectrum distortion caused by reflected waves which are of lower amplitude by the time they reach the microphone. This problem is discussed more fully later.

Reducing Outband Noise

Outband noise means any signal which is not within the frequency range of the pipe being recorded. It is important to cut out such noise otherwise needless degradation of the sound can result. Of particular importance is noise below the lowest frequency (the fundamental frequency) of the pipe in question, as there is absolutely no reason why the pass band of the equipment should extend below this point. This implies the use of variable high pass analogue filters which should be set to the highest frequency which does not attenuate the fundamental. It is important to insert the filters at the microphone inputs as some high amplitude low frequency noise, such as sub-audio wind fluctuations from the blower, can otherwise dominate the recording level settings. Such fluctuations can be of very high amplitude if picked up by microphones with an extended low frequency response, and inserting the filters after some stages of amplification might result in saturation occurring which the filters can do nothing about. In fact it is often the blower which is the chief contributor to outband noise, and it can be effectively removed by simple high pass filtering for pipes above a certain pitch. For lower frequency pipes, such as a quiet 16 foot pedal flue stop, blower noise can be most intrusive and extremely difficult to get rid of.

How many and what type of microphones should be used?

If you are familiar with studio recording techniques you will be aware that many separate sound channels are often recorded and later mixed down. However you will also be aware of the school of thought which teaches that one should only use the minimum number of channels, or even that anything above two channels is definitely undesirable. The multiple channel doctrine is really one of convenience; it enables the balance of the final mix to be adjusted to suit the tastes of the producer, performers, etc. But the main technical shortcoming is that each microphone picks up signals from regions which are intended to be the focus for others, and all of them pick up ambience from the auditorium. The mixing process then results in artificial peaks and troughs in the frequency response of the final mix due to the phase reinforcements and cancellations which result in the mixing process. In turn this can result in particular notes appearing to be too strong or weak when the recording is auditioned. For our purposes the effect will introduce spectrum distortions by artificially exaggerating or attenuating certain harmonics.

The use of only one microphone removes this problem but it introduces others, the principal one being the choice of microphone position. Because the subjective effects vary with listening position in real auditoria the question arises as to why one position is necessarily better than any other – if a hundred people were questioned they would doubtless give at least a hundred different answers.. When recording organ pipes two microphones might be used, separated by a few metres, as this enables one track or the other to be used later if standing wave effects are troublesome. One should never combine the two tracks by replaying them monaurally, as this leads immediately to spectrum distortions as described above.

As to the choice of microphones, high quality capacitor microphones will give good results; these have an extended and flat response from sub-audio frequencies to 20 kHz or more. The use of proprietary foam windshields can help attenuate blower pressure fluctuations when recording close to the organ. A cardioid beam pattern is preferable to an omni directional one, as this helps to reduce standing wave problems. The axis of peak response would of course be directed towards the pipes being recorded.

What recording medium should be used?

Older analogue magnetic tape systems are almost useless if you are intending to transfer samples directly into a digital organ. The slightest trace of wow or flutter will be painfully apparent when you try to select loop points on the digitised waveform, and it will usually defeat your best efforts to do so. Also signal to noise ratios are barely adequate even when using professional noise reduction systems. However the best quality equipment might just provide passable results, although it is difficult to obtain today outside the second hand market. If you are only intending to use recordings as a basis for subsequent spectrum analysis of the sounds, such as when the “hybrid” method of voicing is used, then the requirements can be relaxed somewhat.

One of the best systems from a technical point of view is DAT (Digital Audio Tape) recorders, but unfortunately these are somewhat expensive. Many users find that the use of tape, a sequential rather than random access medium, also becomes progressively more irritating when locating or repeatedly replaying tracks. If you do use DAT ensure you select a normal PCM (Pulse Code Modulation) recording mode rather than one which uses compression, otherwise some peculiar distortions of the signals can occur. MP3 coding is completely useless for high quality work, if you have not discovered this already.

A Minidisc system is far cheaper while still providing the signal to noise and other benefits of digital recording. It is also a random access data storage medium which is much more convenient in use than tape. However in this case it is less probable that you could select a PCM recording mode and you will almost certainly have to put up with some form of data compression such as ATRAC (Adaptive Transform Acoustic Coding). The problem with these techniques is that they are optimised on a psycho-acoustics basis mainly for the signal characteristics (crest factor, etc) of pop music, thus the long sustained tones you will want to record from an organ pipe might produce some weird distortions on replay. However the problem is unlikely to have major implications in practice provided you use the SP (standard play) recording option rather than an LP (long play) mode. The practical advantages of the Minidisc system are also significant: a small battery powered “walkman” – type recorder together with microphones which can be powered from the recorder itself makes for an extremely compact and convenient recording set-up.

With the appropriate software, a high quality sound card and a personal computer, the waveforms can of course be recorded directly onto hard disc or CD. This would do away with problems of compression. It is probably the most cost-effective means of making recordings of the highest quality.

How do standing waves affect the results?

In this section we shall denote by the term standing wave any stable system of waves reflected from the surfaces in a building which exist simultaneously with the sound emitted from a single pipe. This is to simplify the terminology, which otherwise would require us to define and consider terms such as room modes as well as reverberation, echoes, etc. Although these phenomena are similar to standing waves in the sense they all arise because of reflections within the building, they are usually more complex and to consider them fully would incur an unacceptable digression. A detailed discussion is available in [12].

A standing wave results when the outgoing sound wave from an organ pipe exists at the same time as a reflection of the same wave from a structure such as a wall in the auditorium. The geometry of an organ chamber is pertinent here, because the pipes of most organs are contained in one and because it is a chamber wall which will be closest to the pipes. Therefore when a pipe begins to sound, the reflected wave which will arrive first at a nearby microphone will often be one which arose from the chamber. Because of the small distances involved this wave will usually have suffered relatively little attenuation, unlike waves which arrive later from greater distances. To a first approximation, therefore, we can consider the simple situation in which there is only one reflected wave as being representative for the purposes of this discussion.

The microphone can only respond to the sum of the air pressure disturbances caused by both of these waves; it cannot distinguish between them although a directional beam pattern can sometimes assist in reducing the reflected waves. Figure 5 is a sketch of the situation for one harmonic (i.e. a sine wave) of the tone emitted from the pipe in its steady state regime. In this case the two waves are in phase thus their effects add, producing an augmentation of the signal which is then recorded.

Figure 6 illustrates a similar situation but the signal phases cancel, resulting in a reduction of the signal level recorded for that harmonic.

Because each harmonic is at a different frequency the phases of the reflected waves in each case will differ because of simple geometry: the path lengths between the pipe, microphone and wall remain the same, whereas the distance between the peaks of the waves (the wavelength) varies with frequency. Thus the amount of reinforcement or cancellation will be different for each harmonic. Therefore standing waves will produce a distortion of the true spectrum shape for the pipe under consideration.

In general, the more attractive the ambience of the building in terms of reverberation, the more problematical will standing waves be when making recordings in it. This is because it is precisely the phenomena which give rise to standing waves (reflections within the building) which also generate ambience. A multiplicity of hard stone surfaces results in multiple reflections in which the amplitudes of the reflected waves decay slowly; there is little energy loss at each reflection. It is not unusual to find harmonics in the recorded waveform which almost completely vanish during the steady state phase of a pipe sounding within such buildings, particularly at the lower frequencies.

With experience one can often tell when a particular recording is over-contaminated by standing wave distortions using a simple listening criterion. Because the standing waves do not build up or die away instantaneously, the nature of the replayed sound when it begins and ends gives clues as to the relative proportions of direct and reflected energy. At the beginning of the sound the microphone picks up the direct wave from the pipe before the reflected waves appear. At the end the reverse applies, because the direct wave is cut off abruptly while the reflected ones continue to die away. If there is significant standing wave distortion the timbre of the sound will be heard to change suddenly when the pipe ceases speaking and the sound dies away, and this will often reveal which frequency bands were over-attenuated or over-amplified during the steady state phase. In these circumstances the signal from another microphone should be tried because the standing wave effects at another location will, of course, be different.

Should anechoic conditions be used?

In an attempt to remove the problems of standing waves, some recordings are made in anechoic conditions, in which the pipe radiates in a free acoustic field because no reflections occur. In the past some firms have gained advertising mileage by claiming that they make their pipe recordings in an anechoic chamber. However anyone who has been in such a chamber will have experienced the utterly unnatural sound within, and if such sounds were to be subsequently radiated from the loudspeakers in a small room it is unlikely they would be acceptable. In a large auditorium the situation is different and it is arguable that the anechoic recording method might be the more appropriate. Therefore the anechoic approach cannot always be recommended nor dismissed for all situations.

An alternative approach to achieve near-anechoic conditions is to mount the pipes some distance above ground out of doors. Both techniques border on the impractical for most purposes and they are certainly an expensive way to make recordings.

To conclude, the best criterion is to train one’s ears to judge whether a particular recording is acceptable or not; if the sounds on the recording are what you as the voicer want to hear, then there is no reason why you should not use them in a digital organ.

Pre-processing Recorded Waveforms

Some form of pre-processing is almost always required before a recorded waveform can be used. Even if it is of the highest technical quality it will usually need to be processed in some way before being loaded into a time domain digital organ. For example, there will often be a silent section at the beginning which needs to be edited out, or action noise at the point of pallet opening will likewise need to be removed.

These examples are of the simplest types of pre-processing and they can be executed using widely obtainable software. One way to proceed is to use an ordinary personal computer with one of the many time domain editors available commercially. They are referred to here as time domain editors because they are basic programs which only operate on the waveform itself; they do not usually offer frequency domain facilities such as spectrum analysis. A useful editor is CTWave by Creative Technology who make the SoundBlaster range of computer sound cards, and it is often supplied with a sound card or available from many sources on the Internet. It runs under the Microsoft Windows operating system and requires a sound card in addition. The sound card, connected to a high quality audio system, is essential so that you can hear the results of the editing process. Editors of this type generally operate on Windows WAV files both for input and output.

With modern computers and sound cards it is possible to generate an input file for the editor using a digital link from the recorder which was used to record the sample. If a Minidisc the link will often be an optical one, or a Firewire/SB1394 connection might be available. Doing things this way avoids the slight loss of quality which would result if the recording had to be played back in analogue mode and re-digitised to generate the WAV file. However if this is unavoidable, editors such as CTWave also incorporate the necessary digitising facilities when used in conjunction with a sound card. The output WAV file containing the edited version of the waveform can readily be converted to whatever data type is required by the organ system because the file format is standardised and available in the public domain. The use of Windows, a PC and WAV files makes for a very economical and flexible housekeeping system for managing and processing the recorded and edited data.

Spectrum Analysis

To perform spectrum analysis a frequency domain editor is required. These are not as easily obtainable as time domain editors, indeed they can be very expensive and even then the facilities offered might be little more than rudimentary. Because of this problem, customised editing software is often better able to perform the necessary frequency domain functions for digital organ applications. Commercial editors often seem to be limited to the radix 2 FFT, meaning that data lengths are restricted to powers of two as explained previously. This is quite unnecessary today.

Two important cases where spectrum analysis is required will now be described, the software being different for each.

Case 1 - Single Cycle Analysis

If the recorded signal is very clean, with a high signal to noise ratio and satisfactory in every other way as judged by listening tests, it is possible to derive its harmonic structure in the steady state simply by analysing a single cycle of the steady state waveform. The harmonic amplitudes so produced can then be inserted into a frequency domain organ directly if desired.

Firstly the single cycle is identified or cut out of the steady state part of the recorded waveform using standard on-screen time domain editing functions. It is desirable that the selected cycle begins and ends as close to the zero line as possible, otherwise harmonic information will be lost or harmonic distortion introduced. The number of data samples present in the selected cycle then needs to be examined for two reasons. If a radix 2 FFT algorithm is being used, the number of samples will need to be increased by interpolation to the next highest power of two. Also the maximum number of harmonics which can be extracted from the data cannot be more than half the original number of data points, even after interpolating to some higher figure. Interpolation cannot put information into the data which was not there to start with. Interpolating to a higher sampling rate is the same thing as oversampling, a technique widely used in devices such as CD players.

An interpolator will usually be in the toolkit of most frequency domain wave editors. Note that it is dangerous to interpolate downwards; for example, if the number of samples in the selected cycle is 328 it might be tempting to specify an FFT length of 256 data points. However, unless the interpolator also includes automatic digital low pass filtering, there is a risk that aliased frequencies will be introduced into the spectrum. Therefore in this case it would be safer to interpolate upwards to 512 points and then to use no more than 164 (= 328/2) harmonics in the resulting spectrum. Incidentally, these issues illustrate how awkward it can be if one is forced to use a radix 2 FFT editor. They do not arise if a mixed radix FFT algorithm is available, which will operate on any data sample length.

Despite its simplicity and the fact it is often used, single cycle analysis is not a robust means of deriving a spectrum. Its main shortcoming is that noise or any other undesired artefact in the signal is always forced into the harmonics. This can result in major spectral distortions. The reason this occurs is as follows. The spectrum of any data sample of length (number of data points) N is of length N/2 + 1, representing the theoretical maximum number of harmonics plus the zero frequency (DC) component. Because there is only one cycle in the input data, the harmonics will always occupy adjacent frequency slots in the spectrum – there are no empty spaces between them. Therefore noise in the data can only appear in the spectrum positions occupied by the harmonics themselves.

Besides noise on the signal, other factors which can corrupt the sample being analysed include careless editing so that the beginning and end of the chosen cycle do not join up – there is a discontinuity which introduces false harmonic information. Upwards interpolation can reduce this problem though. Then the chosen cycle might not be a “good” one for several reasons, such as its being too close to the attack transient of the pipe. These errors can all be reduced by taking several cycles and averaging their spectra. However an effective way of deciding whether the spectrum is a “good” one is to re-synthesise the cycle of data from it by using the harmonics in the spectrum in an inverse Fourier Transform. The re-synthesised data can be turned into a continuous waveform by repeated looping and then simply listened to. An easy way to do this is to use the facilities of a computer sound card if it is not possible to insert the data into an organ. The sound which results should be very close or identical to that of the original waveform. If it is not, the data should be rejected and you will have to start again.

The process suggested above has provoked frequent controversy, among others with some in the electronic organ business. The dispute, which is amusing, generally starts along the lines of “what you are suggesting is nonsense – you can do a Fourier Transform and an inverse one as many times as you like and you will always get back to what you started with”. The misunderstanding is one of many held by those who do not examine a process at a sufficiently detailed level. In this case the problem is obvious when it is pointed out.

To resolve the argument, denote the original time series by A; this is the complete waveform from which we are about to select a single cycle. Denote the single cycle by A’. Taking the Fourier Transform of A’ assumes – and there can be no argument here because it is part of the mathematical formalism of the Discrete Fourier Transform – that the waveform actually consists of identical cycles extending to plus and minus infinity on each side of the chosen one. Let the amplitude or power spectrum of A’ be B. Using the harmonic amplitudes or powers in B and then performing an inverse Fourier Transform will reproduce a time series C which will consist of a single cycle at the fundamental frequency, but in general it will not be identical to the waveform A’. This is because the phase of each harmonic in B is unknown – it was thrown away when computing the spectrum B. However the harmonic spectrum of C will be identical to that of A’ and therefore the two will sound the same if looped and auditioned.

“Ha” is the usual response at this point; “it’s QED then”. But wait, if the single cycle C is looped indefinitely in a time domain synthesiser, a sound will be heard which may well not sound at all like the original. This is simply because the original sound was contained in the waveform A, not A’. A’ was derived by cutting it out manually from A, and it is therefore not necessarily the same as a single cycle of A. It is this point, simple though it is, which tends to be overlooked. Any differences will be represented as errors in the computed spectrum B. Errors will also arise if A contains noise such as wind noise, that from the blower or even electronic noise. In these circumstances there will be no single cycle the same as any other, and re-synthesising the waveform as described will reveal this by making it impossible to recover the original sound.

Noise constitutes a major problem in spectrum analysis, and it means that any spectrum derived from a noisy signal is only an estimate of the harmonic amplitudes present. The single cycle method is not robust because it is particularly vulnerable to these problems. The method assumes that the signal is deterministic (can be fully predicted) rather than stochastic (cannot be predicted because of the presence of random processes such as noise). However the problems can be reduced by using multiple cycles instead.

Case 2 - Multiple Cycle Analysis

In this case a spectrum is derived using several, perhaps many, cycles of the waveform. If the number of cycles analysed is p, the harmonics in the spectrum will occur at every p^th frequency slot instead of each successive slot as for the single cycle case where p = 1. This has the advantage that much of the noise and other unwanted artefacts in the signal will be distributed in the gaps between the harmonics, thus the amplitudes of the harmonics themselves will generally be less affected. The noise is not forced solely into the harmonic frequencies as with single cycle analysis. Although the harmonic amplitudes are still only estimates of their true values, the estimates in this case will be more accurate than in the single cycle example.

As in the single cycle case, the data segment to be used can be selected manually from the original waveform using a waveform editor. The required number of cycles is cut out from the waveform, taking care to ensure that the beginning and end of the selected segment lie close to the zero line so they do not introduce a significant discontinuity into the spectrum analysis. However it is better in this case to apply a graded data window to the segment, which has the effect of reducing the effects of such discontinuities. There are many window functions which can be used but an appropriate one for these purposes is the Hamming function. This is available in the toolkit of several up-market commercial editors, such as CoolEditPro (this was a widely used editor when this article was first posted, but it is no longer available. Many others are now widely used, such as WaveLab). The pros and cons of various data windows cannot be entered into here but a rigorous treatment, though not for the faint hearted, is available in the classic text by Blackman and Tukey [10].

The Hamming window function has the shape illustrated in Figure 7. In this diagram the vertical axis represents the value of the function, whereas the numbers on the horizontal axis are arbitrary. This axis merely indicates the extent of the data segment selected for analysis, values around 20 lying in the centre.

Figure 7. The Hamming window function

The Hamming window is based on a cosine function, so it has a maximum value of 1 at the centre. It does not fall quite to zero at the extremes of the window, having a value of 0.08 at these points. The data values in the selected waveform segment are multiplied by the corresponding values of the Hamming function before the Fourier Transform is executed, thus the window has the effect of shading off the values towards each end. Perhaps surprisingly, the use of the Hamming function does not make much difference to the spectrum values for those components of the signal which are deterministic and continuous. In other words the harmonics emerge relatively unchanged, whereas aspects such as discontinuities at the beginning and end of the data segment have much less effect than would have been the case if a window function had not been used. This is because of the shading effect.

An example of a spectrum derived this way is in Figure 8.

Figure 8. Noisy Multiple Cycle Spectrum after Hamming Window applied

The pipe in question was at treble F# on a 4 foot Principal rank (fundamental frequency about 1480 Hz) and about 60 cycles of the fundamental were used in the spectrum analysis. However things were made deliberately difficult for the purpose of illustrating the effectiveness of this form of analysis – the signal to noise ratio of the recording was intentionally made very poor on account of the high level of blower noise allowed to contaminate the recording, together with the considerable distance of the microphone from the pipe. The signal to noise ratio could have been much improved by using a high pass filter at the microphone output to suppress the outband noise from the blower, as described in the earlier section dealing with recording techniques. As it is, the high level of blower noise can be seen from the noisy nature of the spectrum between the harmonics. Also the high DC level will be noted from the line at zero frequency; in fact this also was due to the blower because of the large amplitude low frequency fluctuations it impressed on the signal. Thus the desired segment had, by chance, a large DC offset which was no doubt increased because no attention was paid to ensuring the selected segment started and ended close to the zero line.

In spite of all the advice given hitherto having therefore been deliberately ignored, it is remarkable how sharp the harmonic spikes are. It would have been impossible to have done a single cycle analysis in this case because the desired signal could barely be seen on top of the much larger noise fluctuations from the blower, therefore a clean cycle could not have been selected on which to operate.

Noise Removal

The discussion above brings us conveniently to the subject of how to remove noise from a recording and, having done so, how to reconstitute a clean signal.

From a noisy spectrum such as Figure 8 a noise-free signal can be reconstituted very simply. One merely reads off the amplitudes of the harmonics, of which there are 6 in this case, and uses them in an inverse Fourier Transform (additive synthesis) program. This will generate a few cycles of a noise-free sound sample at the pitch of the fundamental frequency (about 1480 Hz) which can be auditioned by looping. If this reconstituted signal does not sound close enough to the original one, allowing for the noisy nature of the latter, then another attempt at the process will have to be made. However it needs to be emphasised that Figure 8 is a spectrum of deliberately poor quality data, and in practice there are bound to be difficulties in cases as bad as this. Nevertheless it illustrates the method well.

Transient Analysis

All of the discussion so far has considered only the steady state part of the waveform of interest, that is, the part of the waveform after the attack transient has died away and before the release transient begins. The analysis of the transients themselves is much more difficult and, among other matters, it requires close attention to be paid to the capabilities and limitations of spectrum analysis. This is why the essentials of spectrum analysis were considered in so much detail previously, and further discussion is now necessary.

During the transient phases of a waveform the frequency structure often changes rapidly. Particular frequencies (we cannot necessarily call them harmonics because they may not have an exact harmonic relationship during the transient) might grow and decay in an unpredictable manner as the pipe settles down to stable speech. If not harmonically related to start with, these frequencies will then become so as the steady state approaches. When the frequency structure is changing with time one has to apply spectrum analysis carefully to ensure that important aspects of the structure are not missed either in terms of time or frequency.

Spectral Uncertainty

The most important point to keep in mind is the “uncertainty principle of spectrum analysis”, namely that time and frequency can never be measured simultaneously to a high degree of precision. The situation can best be explained by first considering a tuned filter of the sort often used in analogue circuits. If the filter is tuned very sharply to a certain frequency (i.e. if it has a high Q factor), its output amplitude will only change relatively slowly regardless of how fast the input amplitude might change. For example, if a sharp voltage step is applied to such a filter it will “ring” for a certain number of cycles, the number being related to its Q. Therefore by observing this extended output signal, it is impossible to be sure of exactly when the input was applied or removed. In other words, precise frequency measurement implies that time is imprecise. The reverse also applies in that making a precise measurement of when something happens means that the frequency content of the event can only be measured approximately.

Exactly the same applies to the case when digital spectrum analysis rather than analogue filtering is used to measure frequency. The physics, the fundamental mathematical relationships and therefore the numbers are all identically the same as for the analogue case, despite anything else which might be claimed. Before a digital analysis can be carried out it is necessary to wait for a certain time for the required number of data points to arrive. If using a radix 2 FFT with a transform size of 512 points, that time would be about 11.5 milliseconds using the common sampling rate of 44.1 kHz. It is a feature of spectrum analysis that the frequency resolution, which is the smallest frequency interval which can be measured, is the reciprocal of the data length in seconds. In this case the resolution would therefore be 1/11.5 kHz approximately, or about 85 Hz. Frequency differences of less than this do not appear at all in the spectrum because adjacent frequency slots are spaced by this value. Moreover, a feature which might occur in the spectrum, such as a group of lines near a particular frequency, could have occurred anywhere within the 11.5 msec data window. There is no way of telling exactly where, and therefore no way of telling exactly when the event happened. To make the time resolution better one has to use a shorter data window, but then the frequency resolution gets worse. Using a window of length 256 data points in this case would mean that the frequency resolution changes to about 170 Hz.

Because of this inescapable straitjacket, some of what one hears about the capabilities of particular organs to reproduce transient structure is nonsense. Let us take an example. Typically an attack transient for a flue pipe lasts for a time equivalent to about 10 cycles of the fundamental frequency. There are wide variations – pipe organ voicers speak of flutes being “quicker” than strings for example - but this is a working figure for the purposes of this discussion. Therefore a transient will last for about 40 msec for an 8 foot stop sounding middle C. Recently a certain digital organ was said to be able to reproduce a transient in which “the 7^th harmonic in the transient started off 0.5% flat”. At middle C the frequency of the 7^th harmonic is 1834 Hz, therefore a 0.5% frequency difference would be about 9 Hz. Because the entire transient only lasts for 40 msec, the maximum possible frequency resolution within it is the reciprocal of this figure, namely about 25 Hz. This is much greater than the frequency difference of 9 Hz. Consequently it would be impossible, even in theory, to measure such a small frequency deviation within the duration of such a transient, let alone to say exactly where it occurred. Moreover, attempting to simulate such events is pointless for another reason, because the ear itself uses spectrum analysis and it could not therefore respond to such incompatible frequency and time parameters even in theory.

Nevertheless, to be fair it might be the case that a particular digital organ system could attempt to simulate such an event. For example one can envisage data tables for each of the chosen frequencies in a transient in which the ADSR envelope for each can be specified, together with the way in which the frequency varies in time. But in this case the basis on which these data were derived would have to be suspect; it would be false to claim they were obtained from measurements of the behaviour of real organ pipes if the numbers were similar to those above. It is legitimate, therefore, to question whether the capability claimed of such an organ system is useful or relevant. Vague phrases used in marketing such as “modelled synthesis captures the true dynamic of the pipe” could relate to physically meaningless situations if the data defining the synthesis were not chosen carefully.

The Nature of Transients

Provided we do not ask the analysis to do more than the mathematics allows, it is possible to observe within these limits how the sound in a transient varies both with time and frequency.

Figure 9. Time-frequency spectrogram of a real transient

Figure 9 is a high resolution spectrogram, an attempt to represent the three dimensions of amplitude, frequency and time on the two dimensional page. The sound was from a flue pipe as it came onto speech, and the amplitude axis was plotted using linear units, rather than logarithmic ones such as decibels. This was done here merely so that the display appeared cleaner and less cluttered for the purposes of discussion.

Several features are of interest. Firstly the low level noise due to the blower can be observed rumbling away close to zero frequency. Secondly, several harmonics are visible though many more would have been observed if the vertical axis had used logarithmic units. Thirdly, the singular behaviour of the fundamental can be seen in which it rises rapidly to a peak, dies away and then climbs more slowly to reach its steady state value. Such features must not always be assumed to represent real transient effects, though. If the sounds are recorded in a reverberant environment such as large church, it is common to find particular harmonics which behave in such a manner. When the pipe first starts emitting sound there will usually be a short delay before reflected sound of comparable intensity arrives at the microphone. When this happens the amplitude at certain frequencies will either grow or decay abruptly, depending on the relative phase relationships, and it is quite possible it is this phenomenon which we observe here. Therefore, during the transient phase of pipe sounds, we are hearing not only the way that the pipe is settling down to stable speech but the way the standing waves and room modes in the building are stabilising also. The important point to note here is that the effect will be significantly different for different microphone positions, because at different positions the standing wave effects will also change. This is why much of what one hears about transient structure and the necessity of copying it faithfully verges on the absurd. Phrases such as "the second harmonic of a Principal pipe comes onto speech first" are often so much bluster with no basis in reality. This might be the case sometimes, but certainly not always. It is necessary not only to establish what the features of the transient are, but to understand why they are as they are before they can be intelligently simulated in a digital organ. For what it is worth, a key characteristic of the pipe used for the data in Figure 9 turned out to be that the third harmonic grew in quite a leisurely manner compared to the first and second, and this was an important ingredient in getting a realistic simulation of its starting transient.

Assuming that we decide the effects depicted in the diagram are meaningful, one way to simulate this transient in a frequency domain organ would merely be to examine the 35 separate spectra more closely, and read off the amplitudes of the significant harmonics for each one. Depending on how the organ was programmed to receive its input data, one could then construct a separate ADSR envelope for each harmonic or simply copy the measured amplitude values into the appropriate data tables. The same data could also be used in an off-line multiple cycle additive synthesis program to reconstruct the actual waveform less the blower noise, and this could then be used as a noise free transient sample in a time domain organ. This would be an example of “hybrid” voicing as defined earlier, and it results in an extremely realistic simulation of the recorded transient.

When analysing a new transient it is useful to begin with a spectrogram of the type shown above, having first ensured that the spectrum analysis parameters are sensible as already described. The pictorial representation thus obtained is better than many pages full of tables and statistics – a picture is worth a thousand words. In the case shown, the data blocks for the successive spectrum analyses were overlapped in time by half a block length to ensure a smooth time progression.

Waypoints

The amount of data required to define a transient can be reduced by using fewer spectra. The complete spectrogram will often indicate where significant changes in the frequency structure occur, and only spectra from these epochs need be used in many cases. One would generally need to use one spectrum at the start of the transient and one at the end also. This truncated set of data can be referred to as a set of “waypoint spectra”, and it is possible to use a computer program which “morphs” or interpolates each frequency component in the spectra between the waypoints to generate a close frequency domain approximation to the complete transient. The set of waypoint spectra can be used in a multiple cycle additive synthesiser to generate a synthetic transient waveform. This can then be used directly in a time domain organ, or maybe a frequency domain organ can be instructed to do its own morphing when presented with the waypoint data only.

Wavelet Analysis

We have seen that the sound emitted by organ pipes comprises two quite different regimes. During the steady state phase, the statistics describing the waveform are almost stationary and the waveform itself is essentially deterministic, meaning that spectrum analysis is a good technique for establishing its frequency structure. During the transient phases at the start and end of the waveform the reverse applies - the waveform is highly non-stationary in a statistical sense and therefore spectrum analysis is not a good tool with which to establish its time versus frequency characteristics. Because frequency domain digital organs rely on frequency domain (spectral) data, it follows that they will have greater difficulty in simulating transients than their time domain cousins.

An alternative means for gaining insight into how the waveforms evolve during the transient phases is to use wavelet processing, a technique which is well matched to analysing non-stationary time series. Broadly, the difference between conventional spectrum analysis and wavelet analysis can be understood as follows. In spectrum analysis the waveform is effectively multiplied by a series of sine waves at a range of closely spaced frequencies. The power of the signal resulting from each multiplication then represents the power at that particular frequency existing in the waveform. Because each prototype sine wave is continuous and of a duration equal to that of the waveform segment selected for processing, the method is not good at identifying variations in energy which occur over timescales smaller than that of the selected segment. Such variations tend to get smoothed out, as we have seen. Wavelet processing, on the other hand, uses a set of much shorter prototype pulse-type waveforms rather than sine waves made up of many identical cycles, and moreover they are of a wide range of different shapes. In effect each wavelet is slid along the acoustic waveform from the organ pipe to reveal how the signal power corresponding to that wavelet varies with time. The results of a wavelet analysis can be represented as a three dimensional plot of the type shown in Figure 9, except that the limitations of ordinary spectrum analysis are not so much in evidence. Time resolution can be very high, for example.

Wavelet processing can reveal a lot about the fine structure making up transient waveforms. Unfortunately much of it is academic when related to digital organs for at least two reasons. Firstly, at the end of the day a frequency domain organ will still need to receive its input data in the form of tables describing how the power or amplitude of the signal varies with time at a number of frequencies. Certain representations of the data in "wavelet space" can augment but not entirely supplant these data, which conventional spectrum analysis produces. Secondly, we have noted already that we only perceive the aural world after a spectrum analysis has been carried out by our ears. Our brains operate largely on the amplitude versus time envelopes (the ADSR characteristics if you will) of the outputs from the enormous number of active bandpass filters in the cochlea. Therefore there are limits beyond which it is futile to proceed when trying to simulate transient waveforms - the limitations of spectrum analysis apply just as much to our ears as to any other application which uses the technique.

SOFTWARE TOOLS

Having described the main processes used to study the acoustic behaviour of organ pipes, it is useful to summarise the software tools which are necessary to facilitate the studies. Some, perhaps all, can be obtained commercially but their utility tends to be swamped by a mass of options and functionality which is never used. The overhead cost of purchasing the components of the system and assembling them so they all work together would also be prohibitive for many individuals or organisations. This is partly because not all commercial software, particularly spectrum analysis programs, works properly. I discovered (the hard way) that a well known and expensive editor gave the wrong answers for the harmonic amplitudes as you moved the cursor over them in the spectrum display window! Because of such problems I have developed an interactive suite of customised analysis and voicing software which has been in use and continuously upgraded over nearly 20 years. Currently it runs under Windows XP so that maximum use can be made of the labour-saving graphical interface that it offers.

The main elements of the software suite, constituting a minimum set of tools, are:

A time domain editor with which a signal segment can be selected and filed for future reference. Either single or multiple cycles can be selected. Upwards interpolation (oversampling) is an option.

A spectrum analysis program using a mixed radix FFT so that one is not limited to data lengths which are a power of two. The user can select how many and which harmonics are retained for further processing. The use of data windows such as the Hamming function is optional. Computed spectra are corrected (optionally) for the analogue frequency response of the recording equipment if appropriate. A particularly useful interactive feature is the ability to identify the harmonics and their amplitudes semi-automatically - it is only necessary to click somewhere near the fundamental to enable its amplitude and that of the other harmonics to be found instantaneously using a peak-picking algorithm. These values can then be sent to a file for further processing if required. Currently (2009) I am unaware of any commercially available program which offers this extremely useful facility. A screenshot illustrating this feature appears below for the spectrum of a reed pipe, in which the harmonic peaks have been identified automatically as shown by the small red circles. Their positions enable visual confirmation that the identification of the harmonics has been successful. The process can be repeated any number of times if necessary, for example to zoom in or out along the frequency axis thereby including fewer or more harmonics.

Fig 10. Screenshot illustrating automatic harmonic detection

A spectrographic program which generates time-frequency spectrograms of the type shown in Figure 9. Because of the importance of optimising the pictorial output, there are several graphics options such as whether a hidden line plot is produced, whether the amplitude display is linear or logarithmic, whether the origin is at the left or right hand side of the plot, etc.

A single cycle additive synthesis program which operates on a table of harmonic amplitudes to produce a steady state waveform. This can be viewed, as well as auditioned by looping. Harmonic phase optimisation is an option. The program will supply the output waveform to a dedicated hardware synthesiser for audition if required, partly to avoid the long-winded steps necessary to insert it into some proprietary editing programs or some types of organ system, and partly because I am one of those who prefer to twiddle real knobs and switches rather than forever fiddling about with the mouse on virtual screen controls. An important facility offered by the synthesiser is an additional input channel to which a Minidisc player can be connected for example, containing an original recording of the pipe which is currently being simulated. By adjusting the relative volumes of this channel and the output from the synthesiser itself and repeatedly switching between them, extremely accurate reconstruction of the original sound is possible. The synthesiser is controlled via the parallel port of the computer. The synthesiser is pictured in Figure 11.

Figure 11. External hardware synthesiser for voicing

The connection to the computer port is at the rear

The screenshot in Figure 12 shows how a spectrum can be modified interactively and the results auditioned in real time via the synthesiser. The spectrum (i.e. the harmonic amplitudes) will typically have been obtained automatically from the spectrum analysis program described above, or they can be inserted manually from scratch if you want to try "inventing" a new sound. In this display the 3rd harmonic of a 16 foot Violone pipe has been adjusted from its original value to one of 45 dB, and it is shown in the display as a red line for this reason. Each time such as change is made, an inverse Fourier transform is performed automatically (additive synthesis) and the result loaded into the synthesiser. The speed of the system is such that the new sound is heard immediately the change is made. A MIDI keyboard can be connected if desired so that the effect of "spreading" the spectrum across a keygroup can be assessed. However a simpler option is also available in which the numeric keys on the computer keyboard can be used for the same purpose to allow the frequency of the note to be shifted semitonally over an octave. To activate this option this one simply clicks the 'keyboard' button seen on the display. The octave in question is selected by means of the 'octave' and 'footage' buttons on the screen.
Using this program it is possible to optimise the various spectra used for an organ stop across the keyboard compass, and to define the voicing points (keygroups) as well. The program is equally applicable to time domain and frequency domain synthesis - for the former, the spectrum of the sampled waveform is first obtained rapidly using the spectrum analysis facilities described above. Once the results of modifying the spectrum are satisfactory it can then be re-transformed back into a time waveform for use by a time domain organ system. In passing, it should also be noted that it is virtually impossible to develop the necessary voicing skills without a tool of this type. Only by learning how the timbre of a sound depends on its harmonic structure, and how this dependency varies dramatically with pitch and with the type of spectrum (flute, reed, etc), is it possible to voice an electronic organ effectively.
This program could easily become confusing because of the range of options it offers. To avoid this the user is prompted by the appropriate dialogue box which appears whenever an action is required. This also avoids having to clutter the screen with a large number of edit boxes, menus and other encumbrances.

Fig 12. Screenshot illustrating the interactive tonal design program

A multiple cycle additive synthesis program which operates on a set of harmonic amplitudes from different spectra separated in time to produce a waveform whose characteristics vary with time. The spectra will typically be “waypoint spectra” derived from a transient signal. The synthesised transient waveform so produced can be joined onto a steady state waveform to produce a complete sample consisting of transient and steady state portions. The program can generate intermediate spectra by morphing (interpolating) from a limited set of waypoints.

The software is used in conjunction with a two manual and pedal voicing console to enable stops to be properly integrated into a tonal scheme from the point of view of a performer. The keyboards interact directly with the voicing software on a computer. The console is built in skeletal form so it can easily be dismantled and transported to alternative sites; this enables, for example, tonal recipes to be tested in a building before an organ has been installed or purchased. It also enables captured pipe sounds to be re-synthesised and auditioned in the same building to confirm their authenticity. The necessary computer equipment and other hardware, including the dedicated audio synthesiser referred to above, is contained in a small free-standing mobile rack assembly. The voicing console is illustrated below (Figure 13).

Fig 13. The interactive voicing console

Appendix 1 – Digitising Pipe Organ Waveforms

Contents: Analogue-to-digital conversion; sampled signals; aliasing and the Nyquist sampling rate; dynamic range and quantisation noise

Analogue-to digital conversion

Computers, and therefore digital electronic organs, can only operate with numbers expressed in binary notation. Therefore to enable them to process the sounds of organ pipes which actually consist of continuous (“analogue”) air pressure variations, the sounds have to be converted to strings of binary numbers. This process is called digitising the waveform of an organ pipe. It is done by a device called an analogue-to-digital converter (ADC) connected to a microphone or to a recording made from a microphone.

Sampled signals

Figure 1-1 shows a waveform which is a pure tone, a sine wave, though any waveform can be used. The dots indicate the instantaneous values or voltage of the waveform at 20 equally-spaced intervals along the time axis. Thus the dots have values which vary between –1 and +1. If we were to give a computer these 20 values it would be perfectly happy and, with the aid of a suitable program, it could do anything we wished with the sine wave. Therefore the process of converting an analogue signal into a digital one is conceptually simple – it is only necessary to sample it regularly and convert the sampled voltages into binary numbers. This is what the ADC does.

Aliasing and the Nyquist sampling rate

Two factors are important for present purposes. Firstly the rate at which the samples are taken must be at least twice the frequency of the highest frequency in the signal. This is called the Nyquist Rate to immortalise the communications theorist who first realised this. Thus if the waveform is of a Diapason pipe at middle C, its fundamental frequency will be 262 Hz. Typically it will have about 15 harmonics, so the highest frequency in this case will be 15 x 262 = 3930 Hz. Thus it must be sampled at twice this frequency, 7860 Hz. In practice a considerable margin is desirable, and the much higher industry standard sampling rate of 44.1 kHz might well be used. This would certainly be the case if the pipe was recorded on a Minidisc, for example. If the sampling rate is not high enough, spurious tones called aliases will be heard when the recording is replayed, in the form of peculiar whines or whistles.

The situation is simple to visualise by recalling that the wagon wheels in Western movies sometimes seem to rotate slowly or even go backwards – this is merely because the frame rate of the cinema or TV system is not fast enough to capture the much higher rate of spoke movement on a rapidly rotating wheel. What we see on the screen is therefore a spurious, aliased, frequency. (The backwards rotation also illustrates the reality of negative frequencies, which the Complex Fourier Transform can detect. Fourier analysis is considered in detail elsewhere in the article, but the difference between positive and negative frequencies is not discussed further to keep the necessary detail to a minimum).

Dynamic range and quantisation noise

The second important factor concerns the number of bits available in each binary number to represent the values of the digitised waveforms. If there are not enough, the waveforms will sound noisy when replayed or when the digital organ emits them from its loudspeakers. The peculiar type of noise encountered is called quantisation noise. For this reason current digital sound systems such as CD and Minidisc players use 16 bits for each number. This gives noise free reproduction for subjective purposes. With 16 bits there are over 65000 separate steps available, giving a signal to noise ratio of 96 dB (6 dB per bit). Compare this to the 60 dB or so which was all that the old analogue magnetic tape systems offered (on a good day!). However, although sometimes noisy in other ways, analogue systems do not suffer from quantisation noise.

Appendix 2 – Additive Synthesis

Contents: What is additive synthesis; the Inverse Fourier Transform; the myth of Real Time Synthesis; Phase optimisation; the myth of Anharmonicity; Multiple Cycle Synthesis.

Additive synthesis is the process of adding a number of harmonics together to produce a composite tone. Because our subjective perception of tone colour is strongly influenced by the relative proportions of the harmonics, it is possible to derive a huge variety of tones simply by adjusting the numbers of harmonics used and their amplitudes. Extremely accurate simulation of organ tones can be achieved in this way. Because the process is the reverse of spectrum analysis, which uses the Fourier Transform to reveal which harmonics are present in a periodic signal, additive synthesis is often called the Inverse Fourier Transform or IFT.

The process is illustrated graphically in Figure 2-1.

Here we have only two harmonics, the fundamental (blue) and the second harmonic (red). These are pure tones or sine waves, and their amplitudes are different, that of the fundamental being greater than that of the second harmonic. One can conceive of the sine waves as voltages from electrical oscillators set at frequencies exactly an octave apart. Adding these voltages together then produces a summed waveform which is shown by the yellow curve. It is no longer a sine wave, the kink in the curve produced by the low level second harmonic being obvious. Exactly this process was used in some analogue electronic organs such as the Compton Electrone and the Hammond organ with its harmonic drawbars. In a digital organ the waves are all digitised (see Appendix 1) so they appear as a string of numbers. The computer adds these together to produce the summed waveform. Unless there are only a few numbers, it is usually faster to do the additive synthesis by computing the Inverse Fourier Transform directly with an efficient FFT algorithm, as mentioned in the main text.

Real Time Synthesis – often a myth

Although the process is conceptually simple, making it work efficiently involves further complication. In digital organs it is often impossible to get the addition to work in real time if there are many harmonics – the computer simply cannot work fast enough. Thus the synthesis itself is not actually done in real time at all in this case, making the use of the term “real time” an inappropriate and misleading adjective for organs which use it. One way the problem is overcome for the Bradford computing organ is described elsewhere in this article under the heading of frequency domain synthesis. Another way is to compute the IFT in the manner described in the previous paragraph.

Phase Optimisation

Another problem concerns the shape of the summed waveform. Figure 2-2 shows the waveshape when 5 harmonics are summed, the feature of interest being the large peak which develops at the start of each cycle. As more and more harmonics are used, this peak gets larger and sharper. Such a waveform has most of its energy concentrated into the initial spike, with the rest consisting of low energy ripples. In both digital and analogue systems this is an inconvenient waveshape to handle because it can cause either type of system to run out of “headroom”. Moreover, because of the inefficient power distribution with time across the waveform, it represents a waveform of relatively low energy for the peak amplitude it exhibits. This means the signal to noise ratio of the system handling such a waveform is less than optimum.

The peak arises because each harmonic in Figure 2-1 and 2-2 had a starting phase of zero. In other words, each harmonic starts at zero volts and each then begins to rise in a positive direction. The peak can be suppressed by assigning different starting phases to each harmonic, and this leads to a waveform in which the power is more evenly distributed with time. Such a waveform is shown in Figure 2-3 for the same set of 5 harmonics used before. Note that the peak amplitude of the optimised waveform has been reduced by about one third. When many harmonics are involved the reduction can be dramatic. Even though the amplitude has been reduced a major benefit of this type of waveform is that it sounds louder if all other factors remain the same. This is because of the more uniform distribution of power with time. Thus the signal to noise ratio of the organ system is also increased. However the tone colour is not altered because our ears are insensitive to phase.

The process which determines the starting phase of each harmonic is part of the computer program in the organ system. Unfortunately it seems to be little used, one reason no doubt being the additional time overhead involved in computing the phases. The optimum phases depend intimately on the waveshape in question; there is no single solution to the problem.

Anharmonicity – another myth

One commonly hears that non-harmonic frequencies are present in real organ tones, and therefore that these have to be used when synthesising them using additive synthesis. This assertion, which is complete nonsense in the way it is usually posed, is based on a number of misunderstandings and it arises because people confuse the forced and natural vibration frequencies of organ pipes. The subject is discussed in detail in [11]. In steady state sound emission we only perceive a pipe as having a definite pitch because the waveform is precisely periodic. If it was not the notion of pitch would be vague, as it is for bells and chimes etc, whose overtones are not harmonically related. In a periodic waveform the harmonics are exact integer multiples of the fundamental frequency, therefore if the frequency of one of them varies, the others must also vary to retain periodicity. The pitch of the pipe will then change also.

However, it is possible for the various frequencies making up a rapidly changing waveform not to be precisely harmonically related until the pipe settles down to stable speech. Such waveforms occur during the attack and release transients of pipes. But the question which then must be asked is whether the structure is sufficiently non-harmonic for it to be detectable using a spectrum analysis. This also is an area where there is much misunderstanding and woolly thinking, and it is discussed extensively in this article in the sections dealing with spectrum and transient analysis. If it is impossible to detect such structure using spectrum analysis methods, then our ears will not detect it either because we can only perceive the aural world after a spectrum analysis has been performed.

Multiple Cycle Synthesis

If a single set of harmonics is used in additive synthesis, all cycles of the waveform produced will be identically the same. Therefore it is only necessary to actually synthesise a single cycle, which can then be repeated ad infinitum by looping round it indefinitely.

It is sometimes necessary to use a series of spectra, thus several sets of harmonic amplitudes, as the starting point of the additive synthesis, each spectrum generating a different cycle of the waveform. When the successive cycles are put together a waveform will result which varies with time. This technique is used to synthesise an attack or release transient for a pipe tone, or to relieve the monotony of the otherwise identical cycles during the steady state sound.

NOTES AND REFERENCES

1. “Multiplexing System for Selection of Notes and Voices in an Electronic Musical Instrument”, US Patent 3610799, 5 October 1971.

2. “Digital Generator for Musical Notes”, UK Patent 1580690, 3 December 1980.

3. In 1999 Wyvern Organs defended their use of the Z80 microprocessor, which first appeared over two decades ago, in a candid and useful description of their organ system (see “Electronic Technology”, C Peacock, Organists’ Review, August 1999, p. 275).

4. Some systems assign each sample to a single key rather than a keygroup and interpolation (a blending technique) between samples is then used depending which key is pressed. This produces a smoother variation in tone quality across the compass.

5. In fact it is not necessary to store any sine wave information at all in principle, because if the computer is fast enough the individual numbers representing the sine wave could be computed as needed on the fly (i.e. in real time).

6. “The Tonal Structure of Organ Flute Stops”, C E Pykett 2003, currently on this website. (read)

7. “MIDI for Organists”, C E Pykett 2001, currently on this website. (read)

8. “Novel System of Organ Building by Mr J T Austin, Jun, of Detroit, USA”, in Organs and Tuning, T Elliston, Weekes and Co, 1898.

9. “A Proper Organ has Pipes”, J Brennan, The Organbuilder, Vol 17, November 1999

10. “The Measurement of Power Spectra”, R B Blackman and J W Tukey (Dover 1958)

11. “How the Flue Pipe Speaks”, C E Pykett 2000, currently on this website. (read)

12. "Fundamentals of Musical Acoustics", A H Benade, Dover, New York, 1990. ISBN 0 486 26484 X

13. Simulating every note separately in the Bradford system was described by its inventors as "resource hungry and therefore expensive" in a recent article. This makes it at once inferior to virtually any modern time domain sampler in this important respect. See "Music's measure: using digital synthesis to create instrument tone", Organists' Review, May 2007, p. 35, Peter and Lucy Comerford. The words quoted are on p. 38 of the article.

14. "Physical Modelling in Digital Organs", C E Pykett 2009, currently on this website (read).