Oh, the Voices (Part 1)

Part I: Tidying Up Talent Vocals
By Steve Dove, Wheatstone Minister of Algorithms

Steve DoveThe microphone processor has long been important but in recent years it has become vital. Mainly this is due to the recent trend of referencing audio to 0dBfs (the maximum signal level in a digital system) rather than the cozy old nominal 0dB VU. Most popular music releases are “normalized” or processed so that their highest peaks are at 0 dBfs, if they're not totally squashed and clipped to blazes up against that limit. Compared to a playout system crammed full of this and hyped-up commercials, an unprocessed announcer’s voice can seem quite wimpy and out of place.

Consider also the entire radio air-chain. Sitting ahead of the transmitter is usually a Very Serious Processor, which is generally set up (in a music format) to be optimal for music, secondarily for voices. Presenting a processed voice that better suits the “big guy” can pay large benefits in on-air voice sound.

Other program distribution chains such as that produced by highly bit-reduced streaming codecs benefit from attention to the voice, whilst talk radio lives and dies by - voices. A good mic processor brings much to all these scenarios.

Let's run through the sorts of things we might want to do to a voice to tidy it up, improve listenability, and better integrate with today's technological expectations.

In the Beginning Was the Microphone

The simple act of terminating a microphone is anything but simple. There are a wide variety of microphone types ranging from high-end condensers that have phantom power needs and phenomenal output levels, to ribbons that have phenomenally low output levels demanding huge gain and exceptional noise performance. A good mic-pre has to terminate, accept, and indeed power all these types.

Polarity Reversal

The absolute polarity (in-phase, or 180-degrees out-of-phase) of an announcer's microphone can make a huge difference in how he sounds in his headphones (if not to air). Also, in a low isolation multi-mic environment, switching the polarities of the various microphones can mitigate acoustical cross-coupling coloration.

Voice Processing Tools Of The TradePhase-Rotation (Decorrelation)

This is a wonderful old trick that, if intelligently done, can afford serious benefit to voice in a broadcast chain. The human voice's physiology is such that its waveform is highly asymmetric -- one polarity's amplitude is much greater than the other's. By shifting the phase of the voice's fundamental frequency with respect to its predominant harmonics (second and third, mainly) the summation that makes those asymmetric peaks is reconfigured and the waveform becomes less asymmetrical. The good news is that the ear is pretty insensitive to phase changes and the sonic hit is very slight; the benefit is well worth the cost. The overall amplitude has been reduced (that summation peak avoided), but is more symmetrical -- that means that for the same amount of energy, the overall amplitude can be increased to match the same peak level as the original. This increase is a direct increase in loudness, essentially for free.

Ideally, speech decorrelation needs to take place early on in the air-chain, before any dynamics processing is applied. Why? Because raw, asymmetric voice can force, say, compressors, particularly fast ones, to do less natural-sounding things than its decorrelated version would. Not to harp on about the “louder” thing, but decorrelation affords the possibility of using LESS compression downstream to achieve the same sonic “weight.”

M4IP GUI_1180Doing the decorrelation where it does the most good, right at the front of the air-chain, means that the decorrelation feature almost invariably found in the main station processor can be turned off; there's nothing to be won by doing it twice, and it can only benefit the music by its absence.

Garbage Disposal

M1 Block_2560Most U.S. air studios are, frankly, not acoustically optimal. They're noisy, overly reflective and rarely have the reverberation times low enough and evened out over frequency sufficiently to be able to use microphones at optimum distances without sounding 'roomy', distant, and highly colored. (Whether this is cause or effect of the typically used close-talking mic technique is a several-beer discussion.) Processing can help somewhat.

High-pass filter:

Given adequate pop-protection 'twixt presenter and mic, common unwanted noises are structure-borne impacts (footfalls) from elsewhere in the building, and air-con rumble. Here a high-pass filter is invaluable. Rolled up as far as one dares without impacting voice quality, this alone is worth the price of admission. The fundamental frequency of even a “basso profundo” Monster Truck voice rarely gets below 60Hz, so there is a lot of room for maneuver.

Low-pass filter:

Less obvious, because many people simply can't hear that high (any more), are whistles from switch-mode power-supplies, particularly from some lighting fixtures, and especially the line-sweep frequencies of TVs and monitors within the acoustic space. If most people can't hear them, what's the problem? Some can, and are driven nuts, and downstream processing CERTAINLY detects them, and can be driven worse than nuts by them. A low-pass filter can help in ameliorating the worst of these, but the sonic penalty can be severe, and fixing the source problem is a far better long-term solution.


This is an automatic means of opening a microphone to air solely in response to someone speaking into it. There are a couple of good reasons for doing this: room noise (air-con noise, etc.) can be suppressed when no speech is present and speech itself at least partially masks the noise. Gating can also help with cross-mic bleed and/or coloration in a multi-mic environment, acting as a basic auto-mixer. In both these usages it is amazing how little attenuation “depth” is needed to make a big improvement -- just a few dB, certainly not using it as a bang-bang on-off switch. The less attenuation is invoked, the less noticeable it is. Judicious settings can be completely transparent.


Some people just have sibilant voices anyway. Put them in front of a bright-as-a-laser condenser microphone and nations can fall. A de-esser looks at the spectrum where 'esses' are typically found (6kHz is a good start), ignoring everything else, and reduces the level if the energy there goes above a given threshold. In this way just the 'esses' can be constrained, and everything else left alone. It is a far preferable technique to trying to use a conventional equalizer; that just leaves everything sounding dull all the time if dialed in sufficiently to tame the problem.

Okay. We've tidied things up. Next issue, we will look at the tools that affect the desired sound.

Site Navigations