How to Mix Vocals: Processing Chain, Tuning, and Getting Them to Sit

May 31, 2026
Featured image for “How to Mix Vocals: Processing Chain, Tuning, and Getting Them to Sit”

The vocal is almost always the focal point of a mix — the element that listeners engage with most directly, and the element that most clearly signals whether a mix is professional or amateur. Getting vocals to sit correctly in a mix is one of the most challenging and nuanced mixing tasks, requiring the right processing chain, careful level management, and an understanding of how the vocal interacts with every other element. This guide covers the complete vocal mixing workflow from gain staging to final polish.

Step 1: Editing Before Processing

No amount of processing fixes a badly edited vocal. Before touching a single plugin, the vocal comp should be complete — the best phrases, words, and syllables assembled from all available takes. Breaths should be managed: not removed entirely (which sounds unnatural and robotic) but reduced in level so they’re present but not distracting. Timing should be checked against the track grid and the other elements — a vocal that consistently rushes or drags relative to the music underneath creates a sense of unease that the listener will feel even if they can’t identify it.

Pitch correction is the next step. Auto-Tune or Melodyne applied subtly — correcting notes that are genuinely out of tune without affecting the natural pitch variation of a confident vocal performance — is standard in virtually all commercial vocal production. The goal is not to create a perfectly in-tune machine; it’s to remove the notes that are distractingly flat or sharp while preserving the emotional character of the performance.

Step 2: The Processing Chain

A typical professional vocal processing chain, in order:

High-Pass Filter

High-pass at 80–120Hz depending on the vocal. Male vocals may have useful content down to 80Hz; female and higher-register vocals can be high-passed more aggressively. This removes low-frequency rumble, HVAC noise, and stand vibration that the microphone picked up.

De-esser

Place the de-esser before the main compressor. Sibilance — the harsh “s”, “sh”, “ch” sounds — is often exaggerated by condenser microphones, and compressing a harsh sibilant peak first will cause the compressor to pump and duck on every sibilant hit. The de-esser catches these peaks before they hit the compressor, making the compression more natural and transparent. Set the de-esser’s frequency to the centre of the sibilance problem (typically 5–8kHz for most voices) and adjust the threshold until sibilance is controlled without dulling the consonants.

Compression — First Stage

The first compression stage controls the dynamic range of the performance — catching the loudest phrases and bringing them into a more consistent relationship with the quieter ones. Ratio 3:1 to 4:1, attack 10–20ms (fast enough to catch transients but slow enough not to kill consonant attacks), release 100–200ms. Aim for 4–8dB of gain reduction on the loudest peaks. An optical-style compressor (LA-2A emulation) or VCA-style compressor (1176 emulation) are both classic first-stage choices, depending on whether you want smooth and transparent or punchy and coloured.

EQ — Corrective

Address tonal problems: cut the boxiness (typically 200–350Hz), reduce any harshness (2–5kHz range), and clean up any mud in the low-mids. Use a narrow Q for cutting specific problem frequencies; broader moves for general tonal shaping. A parametric EQ with a spectrum analyser display makes it easier to identify problem areas — look for persistent peaks that appear on every phrase.

Compression — Second Stage

A second, gentler compression stage adds further consistency and glue. Lower ratio (2:1), slower attack, 2–3dB of gain reduction. This stage smooths out what the first compressor didn’t fully tame and adds the sense of “sticking together” that a well-processed vocal has. Serial compression — two lighter compressors rather than one heavy one — consistently produces a more natural, transparent result.

EQ — Creative

Add air and presence: a gentle high-shelf boost at 10–12kHz adds openness and breath. A presence boost at 3–5kHz adds clarity and helps the vocal cut through a dense mix. These additions are after the compression so the compressor isn’t reacting to boosted frequencies.

Step 3: Level Automation

Even after compression, vocal level automation is essential. The goal: every word is audible, phrases don’t jump out or disappear, and the level feels consistent to the listener. Ride the fader through the entire vocal, phrase by phrase, adjusting level so the quieter passages match the louder ones at a macro level that compression can’t fully achieve. This is one of the highest-leverage mixing tasks available — a carefully automated vocal against a static mix immediately sounds more professional than an unautomated vocal in an otherwise identical mix.

Step 4: Space — Reverb and Delay

Reverb places the vocal in an acoustic space. The type and size of reverb defines the character: a short plate reverb (0.8–1.2 seconds) is warm and musical without washing out the vocal; a hall reverb adds more size but risks muddying the low-mids; a room reverb is the most natural but also the most similar to what the vocal’s own recording environment might have produced. Pre-delay — 15–30ms before the reverb tail begins — separates the dry vocal from the reverb and preserves intelligibility.

Delay adds rhythmic space and depth without the tonal colouration of reverb. A tempo-synced eighth-note or dotted-eighth delay at low level fills the space between phrases and gives the vocal a sense of forward momentum. The classic “delay throw” — automating the delay up only on the last word of a phrase — is one of the most effective and widely-used vocal production techniques.

Further Reading


Share: